Labels

Apache Hadoop (3) ASP.NET (2) AWS S3 (2) Batch Script (3) BigQuery (21) BlobStorage (1) C# (3) Cloudera (1) Command (2) Data Model (3) Data Science (1) Django (1) Docker (1) ETL (7) Google Cloud (5) GPG (2) Hadoop (2) Hive (3) Luigi (1) MDX (21) Mongo (3) MYSQL (3) Pandas (1) Pentaho Data Integration (5) PentahoAdmin (13) Polybase (1) Postgres (1) PPS 2007 (2) Python (13) R Program (1) Redshift (3) SQL 2016 (2) SQL Error Fix (18) SQL Performance (1) SQL2012 (7) SQOOP (1) SSAS (20) SSH (1) SSIS (42) SSRS (17) T-SQL (75) Talend (3) Vagrant (1) Virtual Machine (2) WinSCP (1)

Monday, November 14, 2011

Use of Database lookup in Pentaho Kettle

Database Lookup is used to map the columns of source and destination and get the surrogate key of lookup tables.

For example,

  1. My source table (dbo.Employee) contains employee details including DepartmentID
  2. My destination table (DimEmployee) including a column DepartmentKey.
  3. DepartmentKey is the SurrogateKey in the table dbo.DimDepartment with the following columns: (DepartmentKey, DepartmentID, DepartmentName,...).
  4. Now if you want to load the department key into Dimemployee table, we need to map the DepartmentID of the source with DepartmentID of the DimDepartment table (lookup table) to get the DepartmentKey as output.

This can be achieved in Pentaho, create a transformation as shown below:




















3 comments:

  1. Hi Mahadevan,
    In case there is large amount of data available.. Database lookup step performance will be degrade so in this case what will be the solution..??

    ReplyDelete
  2. Hi Mahadevan,

    As far as i have seen, the lookup only occurs for AND clause in joins like for example:
    select * from employee left join department on e.empid = d.empid AND e.depid = d.depid.

    But how can we implement using OR in a single lookup step?

    Please help me with this.

    Regards,
    Nitish

    ReplyDelete