Informatica Questions- Aggregator Transformation
Informatica Questions- Aggregator Transformation
Q1.) What is aggregator transformation?
Aggregator transformation performs aggregate calculations like sum, average, count etc. It is an active transformation, changes the number of rows in the pipeline. Unlike expression transformation (performs calculations on a row-by-row basis), an aggregator transformation performs calculations on group of rows.
Q2. What is aggregate cache?
The integration service creates index and data cache in memory to process the aggregator transformation and stores the data group in index cache, row data in data cache. If the integration service requires more space, it stores the overflow values in cache files.
Q3. How can we improve performance of aggregate transformation?
·
Use sorted input: Sort
the data before passing into aggregator. The
integration service uses memory to process the aggregator transformation and it does not use cache memory.
·
Filter the unwanted
data before aggregating.
·
Limit the number of input/output or output ports to reduce the amount of data the aggregator transformation stores in the data
cache.
Q4. What are the different types of aggregate functions?
The different types of aggregate functions are listed below:
·
AVG
·
COUNT
·
FIRST
·
LAST
·
MAX
·
MEDIAN
·
MIN
·
PERCENTILE
·
STDDEV
·
SUM
·
VARIANCE
Q5. Why cannot you use both single level and nested aggregate functions in a single aggregate transformation?
The nested aggregate function returns only one output row, whereas the single level aggregate function returns more than one row. Since the number of rows returned are not same, you cannot use both single level and nested aggregate functions in the same transformation. If you include both the single level and nested functions in the same aggregator, the designer marks the mapping or mapplet as invalid. So, you need to create separate aggregator transformations.
Q6. Up to how many levels, you can nest the aggregate functions?
We can nest up to two levels only.
Example: MAX( SUM( ITEM ) )
Q7. What is incremental aggregation?
The integration service performs aggregate calculations and then stores the data in historical cache. Next time when you run the session, the integration service reads only new data and uses the historical cache to perform new aggregation calculation sacramentaly.
Q8. Why cannot we use sorted input option for incremental aggregation?
In incremental aggregation, the aggregate calculations are stored in historical cache on the server. In this historical cache the data need not be in sorted order. If you give sorted input, the records come as presorted for that particular run but in the historical cache the data may not be in the sorted order. That is why this option is not allowed.
Q9. How the NULL values are handled in Aggregator?
You can configure the integration service to treat null values in aggregator functions as NULL or zero. By default the integration service treats null values as NULL in aggregate functions.
======================================================================
Some Points :
Transformation - Aggregator
Transformation -
Aggregator
Aggregator is an active transformation. Output from the aggregator can be different from input. Designer allows aggregator functions only in this transfromation. Following types of function can be used
Aggregator is an active transformation. Output from the aggregator can be different from input. Designer allows aggregator functions only in this transfromation. Following types of function can be used
- MIN
- MAX
- AVG
- COUNT
- FIRST
- LAST
- MEDIAN
- PERCENTILE
- STDDEV
- SUM
- VARIANCE
Along with these aggregate function, you can
use other row level functions such as IIF, DECODE etc.
In Aggregator transformation, at least one
port has to be selected as group by column. By default, aggregator will return
the last value for a port ( if there are more than one record for group by
column). Aggregator will also sort the data in ASC order on group by port.
NOTE: If primary column of the source is used
in group by port, then aggregator will work as sorter transformation.
Nested Aggregate ports can not be used in
Aggregator. Means, you can not get the count(*) in one port and use this value
in other Aggregator port. This will invalidate the mapping.
Aggregtor has a property "SORTED
INPUT". If you check this property, then aggregator assumes that data is
coming in sorted order ( on group by ports). If not, at run time session will
fail. Sorted Input improves the aggregator performance.
Aggregator Transformation in Informatica
Aggregator
transformation is an active transformation used to perform calculations such as sums, averages, counts on groups of
data. The integration service stores the data group and row data in aggregate
cache. The Aggregator Transformation provides more advantages than the SQL, you
can use conditional clauses to filter rows.
Creating an Aggregator Transformation:
Follow the below steps to create an aggregator transformation
Creating an Aggregator Transformation:
Follow the below steps to create an aggregator transformation
·
Go to the Mapping
Designer, click on transformation in the toolbar -> create.
·
Select the Aggregator
transformation, enter the name and click create. Then click Done. This will
create an aggregator transformation without ports.
·
To create ports, you
can either drag the ports to the aggregator transformation or create in the
ports tab of the aggregator.
Configuring the aggregator transformation:
You can configure the following components in aggregator transformation
·
Aggregate Cache: The
integration service stores the group values in the index cache and row data in
the data cache.
·
Aggregate Expression:
You can enter expressions in the output port or variable port.
·
Group by Port: This
tells the integration service how to create groups. You can configure input, input/output or variable ports
for the group.
·
Sorted Input: This
option can be used to improve the session performance. You can use this option
only when the input to the aggregator transformation in sorted on group by
ports.
Properties of Aggregator Transformation:
The below table illustrates the properties of aggregator transformation
Property
|
Description
|
Cache Directory
|
The Integration Service creates
the index and data cache files.
|
Tracing Level
|
Amount of detail displayed in the
session log for this transformation.
|
Sorted Input
|
Indicates input data is already
sorted by groups. Select this option only if the input to the Aggregator
transformation is sorted.
|
Aggregator Data Cache Size
|
Default cache size is 2,000,000
bytes. Data cache stores row data.
|
Aggregator Index Cache Size
|
Default cache size is 1,000,000
bytes. Index cache stores group by ports data
|
Transformation Scope
|
Specifies how the Integration
Service applies the transformation logic to incoming data
|
Group By Ports:
The integration service performs aggregate calculations and produces one row for each group. If you do not specify any group by ports, the integration service returns one row for all input rows. By default, the integration service returns the last row received for each group along with the result of aggregation. By using the FIRST function, you can specify the integration service to return the first row of the group.
Aggregate Expressions:
You can create the aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. You can use the following aggregate functions in the Aggregator transformation,
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
Examples: SUM(sales), AVG(salary)
Nested Aggregate Functions:
You can nest one aggregate function within another aggregate function. You can either use single-level aggregate functions or multiple nested functions in an aggregate transformation. You cannot use both single-level and nested aggregate functions in an aggregator transformation. The Mapping designer marks the mapping as invalid if an aggregator transformation contains both single-level and nested aggregate functions. If you want to create both single-level and nested aggregate functions, create separate aggregate transformations.
Examples: MAX(SUM(sales))
Conditional clauses:
You can reduce the number of rows processed in the aggregation by specifying a conditional clause.
Example: SUM(salary, slaray>1000)
This will include only the salaries which are greater than 1000 in the SUM calculation.
Non Conditional clauses:
You can also use non-aggregate functions in aggregator transformation.
Example: IIF( SUM(sales) <20000, SUM(sales),0)
Note: By default, the Integration Service treats null values as NULL in aggregatefunctions. You can change this by configuring the integration service.
Incremental Aggregation:
After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.
Sorted Input:
You can improve the performance of aggregator transformation by specifying the sorted input. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group. If you specify the sorted input option without actually sorting the data, then integration service fails the session.