Informatica Pushdown Optimization
Pushdown Optimization
Informatica PowerCenter Pushdown Optimization Datasheet (IN06_0412_06675)
What is Pushdown Optimization?
Pushdown optimization is a way of load-balancing among servers in order to achieve optimal performance. Veteran ETL developers often come across issues when they need to determine the appropriate place to perform ETL logic. Suppose an ETL logic needs to filter out data based on some condition. One can either do it in database by using WHERE condition in the SQL query or inside Informatica by using Informatica Filter transformation.
Sometimes, we can even "push" some transformation logic to the target database instead of doing it in the source side (especially in the case of EL-T rather than ETL). Such optimization is crucial for overall ETL performance.
How Does It Work?
One can push transformation logic to the source or target database using pushdown optimization. The Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the source or the target database which executes the SQL queries to process the transformations. The amount of transformation logic one can push to the database depends on the database, transformation logic, and mapping and session configuration. The Integration Service analyzes the transformation logic it can push to the database and executes the SQL statement generated against the source or target tables, and it processes any transformation logic that it cannot push to the database.
Using Pushdown Optimization
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic that the Integration Service can push to the source or target database. You can also use the Pushdown Optimization Viewer to view the messages related to pushdown optimization.
Suppose a mapping contains a Filter transformation that filters out all employees except those with a DEPTNO greater than 40. The Integration Service can push the transformation logic to the database. It generates the following SQL statement to process the transformation logic:
INSERT INTO EMP_TGT(EMPNO, ENAME, SAL, COMM, DEPTNO) SELECT EMP_SRC.EMPNO, EMP_SRC.ENAME, EMP_SRC.SAL, EMP_SRC.COMM, EMP_SRC.DEPTNO FROM EMP_SRC WHERE (EMP_SRC.DEPTNO >40)The Integration Service generates an INSERT SELECT statement and it filters the data using a WHERE clause. The Integration Service does not extract data from the database at this time.
Configurations
Pushdown optimization can be configured to run in three ways.
Source-side Pushdown Optimization
The Integration Service pushes as much transformation logic as possible to the source database. The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the source database and executes the corresponding SELECT
statement.
Target-side Pushdown Optimization
The Integration Service pushes as much transformation logic as possible to the target database. The Integration Service analyzes the mapping from the target to the source or until it reaches an upstream transformation it cannot push to the target database. It generates an INSERT
, DELETE
, or UPDATE
statement based on the transformation logic for each transformation it can push to the database and executes the DML.
Full Pushdown Optimization
The Integration Service pushes as much transformation logic as possible to both source and target databases. If you configure a session for full pushdown optimization, and the Integration Service cannot push all the transformation logic to the database, it performs source-side or target-side pushdown optimization instead. Also the source and target must be on the same database. The Integration Service analyzes the mapping starting with the source and analyzes each transformation in the pipeline until it analyzes the target.
When it can push all transformation logic to the database, it generates an INSERT SELECT
statement to run on the database. The statement incorporates transformation logic from all the transformations in the mapping. If the Integration Service can push only part of the transformation logic to the database, it does not fail the session, it pushes as much transformation logic to the source and target database as possible and then processes the remaining transformation logic.
Integration Service
To push transformation logic to a database, the Integration Service might create temporary objects in the database. The Integration Service creates a temporary sequence object in the database to push Sequence Generator transformation logic to the database. The Integration Service creates temporary views in the database while pushing a Source Qualifier transformation or a Lookup transformation with a SQL override to the database, an unconnected relational lookup, filtered lookup.
- To push Sequence Generator transformation logic to a database, we must configure the session for pushdown optimization with Sequence.
- To enable the Integration Service to create the view objects in the database we must configure the session for pushdown optimization with View.
Configuring Parameters
Depending on the database workload, we might want to use source-side, target-side, or full pushdown optimization at different times and for that we can use the $$PushdownConfig
mapping parameter. The settings in the $$PushdownConfig
parameter override the pushdown optimization settings in the session properties. Create $$PushdownConfig
parameter in the Mapping Designer, in session property for Pushdown Optimization attribute select $$PushdownConfig
and define the parameter in the parameter file.
The possible values may be:
- none i.e the integration service itself processes all the transformations
- Source [Seq View]
- Target [Seq View]
- Full [Seq View]
Supported Transformations
Transformation | Full | Source | Target |
---|---|---|---|
Aggregator | Yes | Yes | No |
Expression | Yes1 | Yes1 | Yes1 |
Filter | Yes | Yes | No |
Joiner | Yes | Yes | No |
Lookup | Yes | Yes | Yes2 |
Unconnected Lookup | Yes | Yes | Yes |
Router | Yes | Yes | No |
Sequence Generator | Yes2 | Yes2 | Yes2 |
Sorter | Yes | Yes | No |
Union | Yes | Yes | No |
Update Strategy | Yes | No | No |
1 PowerCenter expressions can be pushed down only if there is an equivalent database function
2 Not all databases are supported, refer to documentation for more details
Using Pushdown Optimization Viewer
Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. Select a pushdown option or pushdown group in the Pushdown Optimization Viewer to view the corresponding SQL statement that is generated for the specified selections. When we select a pushdown option or pushdown group, we do not change the pushdown configuration. To change the configuration, we must update the pushdown option in the session properties.
Database Support
The following databases can be configured for pushdown optimization:
- Oracle
- IBM DB2
- Teradata
- Microsoft SQL Server
- Sybase ASE
- Any database that supports ODBC
When native drivers are used, the Integration Service generates SQL statements using native database SQL. When we use ODBC drivers, the Integration Service generates SQL statements using ANSI SQL. The Integration Service can generate more functions when it generates SQL statements using native language instead of ANSI SQL.
Error Handling
When the Integration Service pushes transformation logic to the database, it cannot track errors that occur in the database.
When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.
If we configure a session for full pushdown optimization and the session fails, the Integration Service cannot perform incremental recovery because the database processes the transformations. Instead, the database rolls back the transactions. If the database server fails, it rolls back transactions when it restarts. If the Integration Service fails, the database server rolls back the transaction.
Tips
An Informatica mapping that contains parallel lookups cannot be pushed to the database. Design the mapping to use Joiners instead of lookups.
A mapping containing unconnected lookups might perform slower than the integration service's DTM when utilizing PDO. The reason is the SQL generated by the integration service can be complex and slow as unconnected lookups are converted into outer joins. Compensate by changing the lookups to joiners whenever possible.
Mappings containing an Aggregator downstream from a Sorter transformation can not utilize pushdown. This can be handled by redesigning the mapping to achieve full or source-side PDO, configure the aggregator transformation so that it does use sorted input, and remove the sorter transformation.
Variable ports are not supported by PDO. Replace (NET_AMOUNT = AMOUNT - Fee, Dolloar_amt = NET_AMOUNT * RATE) with (DOLLAR_AMT = (AMOUNT - FEE) * RATE).
Avoid unconnected lookups whenever possible.
Avoid parallel lookups for sessions that would benefit from PDO - create sequential lookups
Eliminate sorted aggregation and pass-trougj ports in aggregators
Eliminate variable ports
Use Full Pushdown Optimization - because of large Teradata data volumes, best performance can be obtained by doing all processing inside the database.
When using Pushdown overrides with view - override should contain tuned Teradata SQL.
Filter data using WHERE
clause before doing outer joins.
Avoid full table scans for large tables.
Use staging processing if necessary.
Validate use of primary and secondary indexes.
See https://marketplace.informatica.com/mpresources/Communiities/IW2012/Docs/bos_82.pdf
If you use the ADD_TO_DATE function in a transformation to change days, hours, minutes, or seconds, the function cannot be pushed to a Teradata database.