Have you ever tried to understand the description of the lookup_seq function in the DS Reference Guide? And if you managed to do so, are you still wondering what it can be used for? Well, here’s a nice example.
Case: process a flat file with 2 row types and link master and details together
This file contains travel data of the employees of ACME Inc. For every employee there is
- one row with name and ID
- a number of rows with travel details, one for each trip
The requested output is two tables in a database linked by a master-detail relationship:
DS is not able to directly deal with two different row formats within a single file, but you bet there is a workaround. Just start by defining a generic fixed-width Flat File Format:
And use that file as a source in a data flow:
In the Query transform, exclude the row header lines (starting with ‘Trip,’) from the input file and store all remaining (data) rows in a staging table with 2 extra control columns:
HDRFLAG will contain 1 for any row with employee information, 0 for every trip detail line.
ROWNUM is just a sequence number, from 1 till the number of rows in the file.
The staging table will then serve as a source for generating the master data:
Note the usage of the word_ext built-in function to derive the value of the individual attributes:
The output will look like this:
Then, the details are also derived from the same source in staging:
With the Query transform:
The foreign key column is calculated with another built-in function:
lookup_seq(HANA_DS_STAGE.DS_STAGE.SCN_TRIPINFO, ROWNUM, 0, ROWNUM, SCN_TRIPINFO.ROWNUM , HDRFLAG, 1)
Don’t worry too much about the syntax of this function call and the meaning of its parameters as it can easily generated with the Function Wizard:
And here are the results:
Every row with trip details is magically linked to the employee who made the trip.
Note: The function wizard allows to specify one Compare column / Expression pair only. But in fact lookup_seq can accept more (just like the normal lookup can do, as well). In case your comparison is based on multiple columns, generate the call on one comparison with the wizard first and then manually add more.