Maximizing push-down operations to the database server
SAP BusinessObjects Data Services generates SQL SELECT statements to retrieve the data from
source databases. The software automatically distributes the processing workload by pushing down
as much as possible to the source database server.
Pushing down operations provides the following advantages:
- Use the power of the database server to execute SELECT operations (such as joins, Group By, and common functions such as decode and string functions). Often the database is optimized for these operations.
- Minimize the amount of data sent over the network. Fewer rows can be retrieved when the SQL statements include filters or aggregations. You can also do a full push down from the source to the target, which means the software sends SQL INSERT INTO... SELECT statements to the target database. The following features enable a full push down:
- Data_Transfer transform
- Database links and linked datastores
Improving throughput
Use the following features to improve throughput:
- Using caches for faster access to data
You can improve the performance of data transformations by caching as much data as possible. By
caching data in memory, you limit the number of times the system must access the database.
- Bulk loading to the target
The software supports database bulk loading engines including the Oracle bulk load API. You can
have multiple bulk load processes running in parallel.
- Other tuning techniques
- Source-based performance options
- Join ordering
- Minimizing extracted data
- Using array fetch size
- Target-based performance options
- Loading method
- Rows per commit
- Job design performance options
- Loading only changed data
- Minimizing data type conversion
- Minimizing locale conversion
- Precision in operations
- Source-based performance options
Using advanced tuning options
If your jobs have CPU-intensive and memory-intensive operations, you can use the following advanced
tuning features to improve performance:
- Parallel processes—Individual work flows and data flows can execute in parallel if you do not connect them in the Designer workspace.
- Parallel threads—The software supports partitioned source tables, partitioned target tables, and degree of parallelism. These options allow you to control the number of instances for a source, target, and transform that can run in parallel within a data flow. Each instance runs as a separate thread and can run on a separate CPU.
- Server groups and distribution levels—You can group Job Servers on different computers into a logical component called a server group. A server group automatically measures resource availability on each Job Server in the group and distributes scheduled batch jobs to the computer with the lightest load at runtime. This functionality also provides a hot backup method. If one Job Server in a server group is down, another Job Server in the group processes the job. You can distribute the execution of data flows or sub data flows within a batch job across multiple Job Servers within a Server Group to better balance resource-intensive operations.