Quantcast
Channel: SCN : Document List - Data Services and Data Quality
Viewing all 401 articles
Browse latest View live

BODS Custom Scheduler - Event Based Sequential Scheduler Technique

$
0
0

Scheduling jobs as per your requirement is quite important and difficult task in BODS.

 

However, now you can do this task, as per your preference.

You can create a schedule of jobs, which will execute jobs one after another, i.e. event based sequential scheduler in BODS

 

 

Step 1 –

          You need 'Execution command file' for all the jobs, you want to schedule. If your are already aware of how to create same, you can go to step 2 directly.

 

a)              Others, Follow the steps as per below document, to get a batch file for each BODS Job-

     http://scn.sap.com/community/data-services/blog/2012/08/22/sap-bods--running-scheduling-bods-jobs-from-linux-command-line-using-third-party-scheduler


Step 2 –

    1. You should get one batch file(for windows), e.g. job_name.bat.              

This file is created to SAP BODS installtion folder. But if you give your own path it will be created on your specified path.

You might not find this file, as folder is hidden sometimes. You can search the file by making all folders visible.


   2. Try to execute this file directly from command prompt.

   3. Make sure this batch file is running as expected, in command prompt.

          4. If not, then cut/paste this batch file to other location & try running.

                   Eg. Type E:\SAP\JOB_NAME.BAT & press enter.

 

 

Step 3 –

 

  Once file is executed on command prompt, we can create our own scheduler in bods designer.

1.      Create a new job & new script.

 

2.      Type in script  –

Exec(     'cmd.exe’                                   ,   'E:\\SAP\\JOB_1.BAT',      8) ;     #Standard command

Print(  ' Job_1 Completed....') ;                                                                     #Print the status of commad to log file


Exec( 'Batch_file_Path\\JOB_2.BAT'    ,  '' ,    8) ;                                     # cmd.exe  is optional, as batch files are executables

Print(  ' Job_2 Completed....') ;            


print ( Exec('cmd.exe’,    'Batch_file_Path\\JOB_3.BAT',   8)   ) ;        # You can print the status of job execution directly.

 

& So on…………..

(Just take care of escape caracters while giving the file path, as per your operating system)

 

3.       Save script/job.

 

Your custom scheduler for many jobs is ready.

 

This will execute your jobs as per sequence you decide.

You can extend this technique to execute many jobs parallely by adding many scripts/workflow combination.

If you know python/MS DOS batch commands/shell scripts, you can do multiple tasks with this scheduling technique.


Data loading from SAP ECC Extractor to SAP HANA using BODS

$
0
0

Purpose of this document is to showcase the functionality of SAP Business Objects Data Services to extract data from predefined SAP ECC datasources(Extractors) and load it in SAP HANA.

 

What are Extractors(Datasources) -

  • Extractor is a new object type available in SAP Application Datastores starting with Data Services 4.0
  • This object type represents SAP Business Suite Content.
  • There are two ways to consume the extractors: with the Operational Data Provider (ODP) interface or without.

 

Types of Extractors -


1     Application Specific-
These are SAP- predefined extractors.
This is predefined combination of different logical fields.
E.g. LO,FI etc.
2      Cross Application-
These are created when none of the SAP- predefined extractors meeting the business requirement.
Generic Data Sources will be created from :
Transparent tables
Database views or SAP query
Functional areas or via a function module

 

Steps to release the datasource(Extractor) from SAP ECC system :

 

Login to SAP ECC systems.

 

login.png

 

Use below TCODE to release the datasource(extractors) from ECC :

 

  • RSA5 – To view all supported datasources.
  • RSA6 – To view all activated datasources.

 

RSA56.jpg

 

LBWE – Activate/Maintain datasource

 

LBWE.jpg

 

SBIW – Delete/fill setup table

   

SBIW.jpg

 

RSA3 – to view the specific extractor data(Provide extractor name in datasource)

 

RSA3.jpg

Implementation :

 

Create Datastores in SAP BODS :

 

Create two separate datastore connection for SAP ECC and HANA :

 

 

DS1.jpgDS2.jpg

ETL steps to load Extractor data in SAP HANA :

 

Import the source extractor in SAP Application Datastore

Below is the structure for source columns :

  STRUCTURE.jpg

 

Create/import metadata in HANA.

Import the target table (below is the target table structure) :

 

TGT_Structure.jpg

 

Create a Batch Job.

 

JOB_DESIGN.jpg

 

Create a Dataflow and add it to the Batch Job.

Add the extractor to the Dataflow.

Open the extractor.

Make sure the Initial load drop down list box has the value Yes ( As we are doing Full refresh here) :

 

Extractor_option.jpg

Add a Query transform to the Dataflow and connect it to the extractor.

Add the target table to the Dataflow and connect it to the Query transform.

Open the Query transform and map the fields you want to retrieve.

Open the target table and check the "Delete data from table before loading" checkbox for initial load.

The resulting Dataflow should look something like this…

DF_design.jpg

 

Delta load :

 

  • Extract only the changed rows from the source.
  • It is also called incremental extraction.
  • This method is preferred because it improves performance by extracting the least number of rows.
  • When using Business Content Extractors, you have to check to see if the extractor you are using has delta recognition capabilities.
  • When you use an extractor that has delta recognition capabilities, it has the following two fields:
    • I for INSERT
    • B for before-image of an UPDATE
    • U for after-image of an UPDATE
    • D for DELETE

 

These two columns are used to process delta load :

 

DI_SEQUENCE_NUMBER – is used to determine the order in which the rows are processed. This is very important since there can be many updates to a single row and the updates on the target need to be applied in the same order as they were applied in the source.

 

DI_OPERATION_TYPE – is used to determine how to process the rows, as described in previous slide.

 

Execute T code in ECC : VF02( to create delta record for billing condition in ECC as we are doing this job for Billing Condition datasource 2LIS_13_VDKON )

 

createdelta.jpg

Select the ITEM, go to condition tab and make the change in any of the attribute.

Execute T code in ECC : RSA7( to view the delta records) .

In BODS, Make sure the Initial load drop down list box in extractor source has the value No ( As we are processing Delta load here) :

 

delta_option_EXTRACTOR.jpg

Execute the batch job in BODS.

Provide the extractor name and verify the data and count in ECC using TCODE : RSA3

verify_ecc.jpg

Write a query in SAP HANA to verify the target table data and count :  HANA_verify.jpg



Conclusion

 

  • With this job, we can conclude that SAP Business Objects Data Services can be used to extract data from ECC datasources(Extractors) and data can be easily loaded to HANA.
  • This can be effectively used in scenarios where predefined datasources needs to be used to load standard extractors data to different targets as per the business requirement.

Q&A Session for ASUG Webcast: What’s New? Data Services and Information Steward SP4

$
0
0

Q&A Session for ASUG Webcast: What’s New? Data Services and Information Steward SP4

 

 

Date:  Wednesday, February 25, 2015

Q:  For HANA SP08 & SP09 what are the Data Services SP/Patch we need to be on?

A:  http://service.sap.com/sap/support/notes/1600140 

A:  The SAP note has all details on HANA patch/SP releases and the dependency with DS/IS releases. 

________________________________________________________________

Q:  When can we expect to see SQL 2014 support? 2015?  What Support pack?

A:  SQL Server 2014 support is planed for DS 4.2 SP5 (the next SP, planned to be available early May). This would be source, target en repository support. 

________________________________________________________________

Q:  DS designer is very "chatty" with the Repository DB. It also is PC memory hog, seems to download the entire repository to PC Memory. Is there any plans to change/fix this?

A:  Do you still see this in the latest releases ? (4.2) I'm aware of slow response times with "remote"  respositories, for that we recommend Citrix-like solutions so that you can keep the Designer close to the database. 

________________________________________________________________

 

 

Q:  Can Data Services installed on Windows connect to Hadoop? If not, is this something that is on the DS roadmap?

A:  Most HADOOP clusters are runing on Linux, we also need DS on Linux in order to connect to the HDFS files. Having DS on Windows for HADOOP is not planned yet. ALways good to add/vote on IdeaPlace if this is important for you.

A:  https://ideas.sap.com/SAPBusinessObjectsDataServices  

________________________________________________________________

Q:  Thanks for the note. It says HANA SP9 P0 suppored by DS 4.2 SP4. Does that mean you need to upgrade your DS to that level or can a lower DS version still be compatible wwith HDB SP9 but maybe not new features?

A:  The combinations in the note is what we have tested and what is fully supported. Other combinations might work, but not guaranteed... 

________________________________________________________________

Q:  Are there architecture differences between DS 4.2 SP1 and the current version (4.2 SP4)?

A:  NO, we don't make archictecture changes in SP. For an architecture change we would wait for a new release like 4.3 or 5.0 (not planned). 

________________________________________________________________

Q:  Is there a detailed roadmap doc? or opportunity to connect on feature requests (IS particularly)

A:  Data Services Roadmap (SMP logon) https://websmp208.sap-ag.de/~sapidb/011000358700001160072012E.pdf 

A:  Information Steward roadmap https://websmp208.sap-ag.de/~sapidb/011000358700001160082012E.pdf  

________________________________________________________________

Q:  Does DS 4.2 SP4 support FTP outside for of the firewall?

A:  Available in SP5. 

________________________________________________________________

Q:  Are the planned DS certifed SP for HANA on the road map?

A:  Every new HANA release is tested with Data Services 

________________________________________________________________

Q:  Yes it has new ones, we want to plan ahead. If we plan to upgrade Hana, what shoulld we plan for DS

A:  You know DS releases - May SP5 November SP6 - next HANA version is SP10 in June - next Sp5 will support next HANA version 

 

You can register for more upcoming ASUG webcasts here

The new Bypass feature explained

$
0
0

One of the enhancements introduced in SAP Data Services 4.2 SP3 is the possibility to bypass the execution of specific work flows or data flows in a job. I am using it a lot , saves me the hassle of having to define conditionals while developing my jobs.


The feature is fully documented in chapter 17.7 Bypassing specific work flows and data flows of the SAP Data Services Designer Guide. Unfortunately, the guide lacks a simple example with screenshots. Here it is.


The Bypass functionality is defined as a property of data flows and work flows. It is set using a substitution parameter. In DS Designer, right-click on a data flow or a work flow icon and select the Bypass… option in the pop-up menu. You basically have 2 options, now:


1/. No Bypass, the default:


 

2/. The value of a substitution parameter, select it from the dropdown list:

 

 

Make sure it is set to YES and your flow won't be executed next time you run your job:


 

Any other value for the parameter will revert to the default behaviour, no bypass.

How to get latest file using wait_for_file() function

$
0
0

Hi All,

 

I am facing issue using wait_for_file function, my requirement is in my source file path C:\FTP\PROD_20150314.csv  & PROD_20150314.csv two files having same name with different time factor. When I start  job want to take only one file that is latest one need to be process. How to set this option in wait_for_file function. currently I am using this pattern.

 

Substitution Param:

==================

$$Filepath = 'C:\FTP'

$$Filename = 'PRODUCT_*.csv'  (i.e. PRODUCT_20150314.csv)

 

 

Script:

=======

$PATH  ='[$$Filepath]';

$Filename = '[$$Filename]';

wait_for_file('[$PATH]'\\'[$Filename]', 0, 0, 1, $filestat);

$filestat = replace_substr($filestat,'/','\\');

 

 

What changes is required to get latest file from the path.

Thanks in Advance...!!!!

 

 

Regards,

Srini

Transferring a file from SAP BODS using FTP.

$
0
0

Create a BODS job with the following script in it.

 

exec('E:\FTP.bat','E:\ABC.txt',2);

 

PS:: The 2nd argument you can dynamically specify file name using variables or you can hard code it here and pass it as a argument to the FTP script.

 

The contents inside the batch file 'FTP.bat' is given below

 

@ECHO OFF
:: Create the temporary script file
> script.ftp ECHO open XXX.com
>>script.ftp ECHO USER username
>>script.ftp ECHO password
>>script.ftp ECHO cd /test
>>script.ftp ECHO put %1
>>script.ftp ECHO bye

 

FTP -v -n -s:script.ftp

 

Further reading for the preparation of the script is http://www.robvanderwoude.com/ftp.php

SAP BODS advanced Optimization & error handling techniques.

$
0
0

I have used following optimization techniques while working in project.

 

I am sharing my problems and solution which will help you to understand the issues and optimization techniques used for same,

 

1) Table Comparison options:

 

The performance of a ‘Table_Comparison’ transform can be improved by caching the comparison table. There are three modes of comparisons:

 

  • Row-by-row select
  • Cached comparison table
  • Sorted input

 


Basically we use table comparison for target based Change data capture (CDC).

It is used to capture a changed data that is present in the source but not in target and/or changes in source but not in target.

There many methods available for source and target bases CDC.

Follow the link below (for Oracle DB) for the same: https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm

tb com.jpg

1.1 Row-by-row Select

 

We can choose this option in following case

1.1.1Option 1 - Normal operations

 

Select this option to have the transform look up the target table using SQL every time it receives an input row. This option is best if the target table is large compared to the number of rows the transform will receive as input. Make sure the appropriate indexes exist on the lookup columns in the target table for optimal performance.

 

1.1.2Option 2 - When we need to consider a trailing blank while comparison

 

While comparison if in source field value is with trailing blanks then BODS will treat it as different value i.e. BODS will not apply any trim function for that Particular field.

Example:

Consider an account number'1234 ' (I.e with trailing space) in source and '1234' in target. In such case, BODS considers both the account numbers as separate in case of row by row but in case of cache comparison it will consider both account numbers to be the same. The latter is explained below.

 

1.2Cached Comparison:

 

Select this option to load the comparison table into memory. In this case, queries to the comparison table access memory rather than the actual table. However, the table must fit in the available memory. This option is best when the table fits into memory and you are comparing the entire target table.

With the help of this option BODS will capture target data into internal Cache which is again very faster to access.

1.3Sorted input:

 

Often the most efficient solution when dealing with large data sources, because DS reads the comparison table only once. This option can only be selected when it is guaranteed that the incoming data are sorted in exactly the same order as the primary key in the comparison table. In most cases incoming data must be pre-sorted, e.g. using a Query transform with an Order-by (that may be pushed down to the underlying database), to take advantage of this functionality.

 

2) Auto Correct Load option :

Auto-correct load is used to avoid loading of duplicate data in target table using SAP BODS.

Basically, Auto-Correct load is used to implement SCD-1 when we are not using table comparison feature of SAP BODS. There are many options available if you are using Auto -Correct load which can help to optimize the performance.

 

If you are not choosing any Auto -correct load option then BODS will generate simple insert query.

 

Snapshot and query generated by BODS is as shown below:AC 1.jpg


Query :(BODS will generate Insert query,so it may insert duplicate data.)

INSERT /*+ APPEND */ INTO "DS_1"."TEST_AUTO_CORRECT_LOAD" ( "ACCOUNT_NO" , "STAT" )

SELECT "TABLE1"."ACCOUNT_NO" ACCOUNT_NO , "TABLE1"."STAT" STAT

FROM "DS_2"."TABLE1" "TABLE1";

 

Many options are available with BODS Auto-correct load. Some of them are explained in below section:

2.1Allow merge set to Yes & Ignore Columns with null set to No


While going for auto-correct load, if you select ‘Allow merge option to yes & Ignore columns with null to No’, BODS will generate Merge Query to maintain SCD 1, ignoring null values if any.

So by choosing ‘Allow merge option to Yes’, BODS will insert the new rows coming from source to target and update the existing rows from source into target.


 

Snapshot for same is as below:

AC 2.jpg

 

Query generated by BODS job is as follows:

 

MERGE INTO "DS_1"."TEST_AUTO_CORRECT_LOAD" s

USING

(SELECT  "TABLE_1"."ACCOUNT_NO"  ACCOUNT_NO ,  "TABLE_1"."STAT"  STAT

FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"

) n

ON ((s.ACCOUNT_NO = n.ACCOUNT_NO))

WHEN MATCHED THEN

UPDATE SET s."STAT" = n.STAT

WHEN NOT MATCHED THEN

INSERT  /*+ APPEND */ (s."ACCOUNT_NO", s."STAT" )

VALUES (n.ACCOUNT_NO , n.STAT)

 

Here, the query generated by BODS is Merge query which will insert the new rows coming from source and update the existing rows present in target.

This query will be pushed down to database hence it will be an optimized one.

 

2.2Allow merge set to Yes & Ignore Columns with null set to Yes

Query generated by BODS job is as follows:

 

MERGE INTO "DS_1"."TEST_AUTO_CORRECT_LOAD" s

USING

(SELECT  "TABLE_1"."ACCOUNT_NO"  ACCOUNT_NO ,  "TABLE_1"."STAT"  STAT

FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"

) n

ON ((s.ACCOUNT_NO = n.ACCOUNT_NO))

WHEN MATCHED THEN

UPDATE SET s."STAT" = NVL(n.STAT,S."STAT")

WHEN NOT MATCHED THEN

INSERT  /*+ APPEND */ (s."ACCOUNT_NO", s."STAT" )

VALUES (n.ACCOUNT_NO , n.STAT).

 

Snap Shot for same as follows:

ac 3.jpg

As seen in the snapshot above, BODS adds NVL function to consider null values if any.


2.3Allow Merge set to No while Auto correct load set to Yes


If you select ‘Allow Merge to No and Auto correct load to Yes’ then BODS will generate PL/SQL code which will again be very helpful when considering performance.

Below is the code generated by BODS:


BEGIN

DECLARE

CURSOR s_cursor IS

SELECT  "TABLE_1"."ACCOUNT_NO"  ACCOUNT_NO ,  "TABLE_1"."STAT"  STAT

FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"

;

s_row   s_cursor%ROWTYPE;

CURSOR t_cursor(p_ACCOUNT_NO s_row.ACCOUNT_NO%TYPE) IS

SELECT "ACCOUNT_NO" ACCOUNT_NO, "STAT" STAT, rowid

FROM "DS_1"."TEST_AUTO_CORRECT_LOAD"

WHERE (p_ACCOUNT_NO = "ACCOUNT_NO");

t_row   t_cursor%ROWTYPE;

commit_count NUMBER;

BEGIN

commit_count := 0;

:processed_row_count := 0;

FOR r_reader IN

(SELECT  "TABLE_1"."ACCOUNT_NO"  ACCOUNT_NO ,  "TABLE_1"."STAT"  STAT

FROM "PSEUDO_PROD"."TABLE_1" "TABLE_1"

) LOOP

OPEN t_cursor(r_reader.ACCOUNT_NO);

FETCH t_cursor INTO t_row;

IF t_cursor%NOTFOUND THEN

INSERT INTO "DS_1"."TEST_AUTO_CORRECT_LOAD"("ACCOUNT_NO", "STAT" )

VALUES (r_reader.ACCOUNT_NO , r_reader.STAT);

commit_count := commit_count + 1;

:processed_row_count := :processed_row_count + 1;

ELSE

LOOP

UPDATE "DS_1"."TEST_AUTO_CORRECT_LOAD" SET

"STAT" = r_reader.STAT

WHERE rowid = t_row.rowid;

commit_count := commit_count + 1;

:processed_row_count := :processed_row_count + SQL%ROWCOUNT;

IF (commit_count = 1000) THEN

COMMIT; commit_count := 0;

END IF;

FETCH t_cursor INTO t_row;

EXIT WHEN t_cursor%NOTFOUND;

END LOOP;

END IF;

CLOSE t_cursor;

IF (commit_count = 0) THEN

COMMIT; commit_count := 0;

END IF;

END LOOP;

COMMIT;

END;

END;

 

 

Snapshot for same is as below.

 

AC 4.jpg

 

3) Array fetch & Rows par commit option

 

3.1Array Fetch size:

 

The array fetch feature lowers the number of database requests by "fetching" multiple rows (an array) of data with each request. The number of rows to be fetched per request is entered in the Array fetch size option on any source table editor or SQL transform editor. The default setting is 1000, which means that with each database request, the software will automatically fetch 1000 rows of data from the source database. The maximum array fetch size that can be specified is 5000 bytes.

 

 

 

Suggestion while using array fetch size option:

 

The optimal number for Array fetch size depends on the size of your table rows (the number and type of columns involved) as well as the network round-trip time involved in the database requests and responses. If your computing environment is very powerful, (which means that the computers running the Job Server, related databases,and connections are extremely fast), then try higher values for Array fetch size and test the performance of your jobs to find the best setting.

3.2Rows per commit:

 

Rows per commit’ specifies the transaction size in number of rows. If set to 1000, Data Integrator sends a commit to the underlying database for every 1000 rows.

 

Rows per commit’ for regular loading defaults to 1000 rows. Setting the Rows per commit value significantly affects job performance. Adjust the rows per commit value in the target table editor's Options tab, noting the following rules:

  • Do not use negative number signs and other non-numeric characters.
  • If you enter nothing or 0, the text box will automatically display 1000.
  • If you enter a number larger than 5000, the text box automatically displays 5000.

 

It is recommended that you set rows per commit between 500 and 2000 for best performance. You might also want to calculate a value. To do this, use the following formula:

max_IO_size/row size (in bytes)

For most platforms, max_IO_size is 64K. For Solaris, max_IO_size is 1024K.

Note that even with a value greater than one set for Rows per commit, SAP Data Services will submit data one row at a time if the following conditions exist:

You are loading into a database (this scenario does not apply to Oracle databases), and have a column with a LONG datatype attribute.

You are using an overflow file where the transaction failed. However, once all the rows are loaded successfully, the commit size reverts to the number you entered. In this case, depending on how often a load error happens, performance might become worse than setting Rows per commit to 1.

 

 


Let us consider a scenario while choosing a different array fetch size and rows per commit value.

 

Let’s say you are setting Array fetch size value to 100 and rows per commit to 500 and the total number of rows in source are 800. Suppose the job terminates after processing 700 rows, then the job will enter only 500 rows into target because we have set rows per commit value to 500. The remaining 200 rows will not enter into the target, while the same 200 rows will be fetched by the same job. It will cause an insufficient data in target.

So, setting array fetch size & rows per commit value is very important while designing a job.


4) Use database link In datastore option :

 

While fetching data from different datastores connected to different database then BODS may not be able to push down the query to database instead BODS will generate separate query for each datastore.

Now to avoid such a situation we can define database link between databases at datastore level, so after applying DB link at datastore level BODS will generate Push-down SQL which is optimized one.

Refer the scenario for the same below.

 

 

4.1 Scenario :

 

I have considered the following tables and database for this scenario.

Table Information:

1) ACCOUNT DIMENSION

2) BORG FLAT TABLE

Datastore & Database information:

1) DATASTORE: EDW_REP POINTING TO DATABASE TCLBI.

2) DATASTORE: PSEUDO_PROD POINTING TO DATABASE PSEUDO_PROD.

The account dimension table has been taken from EDW_REP datastore which is pointing to EDW_REP schema of TCLBI database whereas the other table BORG_DTL is from Pseudo_prod database.

 

The job has been deigned as shown in image below:

db 1.jpg

 

 

As both the tables are from different databases, hence BODS generates multiple SQL which is shown in snapshot below.

db 2.jpg


Also, a DB link is present between both the databases.(DB link is edwrep_to_pseudoprod). This link helps us to generate Push-down SQL thereby generating an optimized SQL.

In order to achieve this, the following advance setting needs to be done in the Datastore of BODS.

Right click on DS_EDW_REP Datastore -> EDIT -> Advanced ->scroll down to Linked DataStores .


Snapshot of same is attached below:

db 3.jpg

 

Click on Linked Datastores you will get following window :

db 4.jpg

 

Choose a datastore that needs to be linked with other datastore, In this case, it is DS_1.

Press OK. Provide proper DB link name from the dropdown


db 5.jpg

 

Press OK again. After saving the job, check the optimized SQL generated by BODS, which is as follows.


INSERT /*+ APPEND */ INTO "DS_1"."TEST_DB_LINK_BODS" ( "KEY_1" , "APP_PRINCIPLE_AMT" , "LOAN_AVAILED_DATE" )

SELECT  "BORG_DTL"."KEY_1"  KEY_1 ,  "BORG_DTL"."APP_PRINCIPLE_AMT"  APP_PRINCIPLE_AMT ,  "ACCOUNT_DIM"."LOAN_AVAILED_DATE"  LOAN_AVAILED_DATE

FROM "DS_1"."ACCOUNT_DIM" "ACCOUNT_DIM" INNER JOIN "PSEUDO_PROD"."BORG_DTL"@DBLINK "BORG_DTL" ON ( "ACCOUNT_DIM"."ACCOUNT_NO"  =  "BORG_DTL"."KEY_1" )

 

 

It generates Insert into statement that is an optimized Push down SQL.

 

Similarly more than 1 DB link may be added if required.

 

5) Use server group for splitting dataflows among servers

 

In BODS, we can define multiple dataflows parallel in single workflow. So when a job is designed by parallel dataflow, BODS will start all the dataflow at same time on particular job server mentioned while executing a job.

Because of this, single job server might experience heavy load. Also resources of that particular job server are shared among multiple parallel dataflows which may degrade the performance of BODS job.

To avoid such scenario we can distribute parallel dataflows to job server group having multiple job servers.

 

5.1     What is server group?

 

You can distribute the execution of a job or a part of a job across multiple Job Servers within a Server Group to better balance resource-intensive operations. A server group automatically measures resource availability on each Job Server in the group and distributes scheduled batch jobs to the Job Server with the lightest load at runtime.

5.2     How to create/manage server group

 

Goto management console ->Server Groups ->All server groups->Server Group configuration-> Click on Add



 

SG1.jpg

Add the job servers into predefined server group and click Apply.


SG2.jpg


5.3     Distribution levels for data flow execution

 

When you execute a job, you can specify the following values on the Distribution level option:

  • Job level - An entire job can execute on an available Job Server.
  • Data flow level - Each data flow within a job can execute on an available Job Server and can take advantage of additional memory (up to two gigabytes) for both in-memory and pageable cache on another computer.
  • Sub data flow level - A resource-intensive operation (such as a sort, table comparison, or table lookup) within a data flow can execute on an available Job Server. Each operation can take advantage of up to two gigabytes additional memory for both in-memory and pageable cache on another computer.

 

SG3.jpg

If a job is run using only single job server then all the dataflow will start on a single job server which may take more time to execute because single job server have a limited resources.

For this job we can split dataflows to multiple job servers with the help of job server group, which is shown below.


[Note : In this example we will consider Data Flow level distribution.]

I have created a sample job to test how data flows spilt occurs.

 

SG4.jpg

 

 


In Job server or Server Group, select server group.

In Distribution level, select on which level (i.e Job,Dataflow or sub dataflow ) splitting should occur.

After executing the job, the Management Console appears as below.


SG6.jpg



6) Error handling by using BODS internal (Metadata) Table

 

Will update soon...

Data Profiling With SAP Business Objects Data Services

$
0
0

Data Profiling  With  SAP Business Objects Data Services

 

 

Data profiling started off as a technology and methodology for IT use. But data profiling is emerging as an important tool for business users to gain full value from data assets. When given the right tools and practices for data profiling, business users should quickly identify inconsistencies and problems for data, before it is used for reporting and intelligence purposes.

 

Data profiling is an important preliminary step to data modeling. It's also used in data quality improvement programs and master data management initiatives to help "ensure the consistency of key non-transactional reference data” used across the enterprise

In the long run, data profiling can be used both tactically and strategically. Tactically, it can serve as an integral part of data improvement programs. Strategically, it can help managers determine the appropriateness of different data source systems under consideration for deployment in a particular project.

 

Introduction:


Data Profiling is implemented over BODS Designer.

Prior actions to be performed before profiling over data.

Step1: Create a Profiler Repository using Repository Manager.

Step2: Assign the Profile Repository to a Job Server using Server Manager.

Step3: Configure the Profiler Repository in BODS Designer.

Step4: Configure the Profiler Repository in BODS Management Console.


Profiler repository stores following information:

 

 

1 - Profiler tasks, that are created when a profiling request is submitted from Designer, from the Management Console you can monitor the progress and      execution of these task

 

2 - Profiling result (data), when you profile columns of tables, the summary and detail information will be stored in profiler repo, along with the sample data for      each profiling attribute (this from where the profiler results are displayed in Designer).

 

There are 2 types of Data Profiling:

1. Normal & Detail Profiling.

2. Relationship Data Profiling.

 

1.  Normal& Detail Profiling using BODS:

 

This will let you understand what are the unique values present in the column, min / max values, how many null values / blanks, their percentage, max /min / avg string length etc.

 

Login to BODS Designer ->

 

1) Select any table or file formats from Object Library for Profiling, Right click & you find the below tab.

2) Here we have an Option “Submit Column Profile Request”, select the option.

 

1.gif

 

3) On selecting option, below tab opens.

4) Here we have all the columns available in the file or table used for profiling.

5) We can select all the columns or specific columns on which profiling have to be performed.

6) If we want DETAILED Profiling, click on the checkbox available for respective columns.

7) Once done, click SUBMIT button.

 

2.gif

 

8) On submission, profiler server now starts the process of profiling the data. The status of the profiling of specific file or table can be views from Management console or we get a status tab for each task completed immediately after above process.

 


3.gif



We can change the Statistics of Profiling from Management console.


4.gif

9) Once Profiling completed, we can go to designer & click view data on file or table.

10) Here we find a tab (Profile Tab) which shows the latest profiling information performed on that table. We can drill down on the parameter values to check the values in detail.

11) Similarly, this result can be checked from database Profiler repository tables.


5.gif


2.Relationship Profiling using BODS:

Lets you know, what percentage / no: of records of a table A is present in table B.

In the both these, there are basic / detailed profiling. For the latter, some attributes will be more

  • Relationship Profiling shows the percentage of Non Matching values in columns of two sources.
  • The sources can be tables, flat files, or a combination of a table and a flat file.
  • Relationship profiling always require 2 tables or flat files to perform the profiling at any particular operation.

Login to BODS Designer ->

  1. Select 1 table or file formats from Object Library for Profiling, Right click & you find the below tab.
  2. Here we have an Option “Submit Relationship Profile Request with”, select the option.

6.gif3.     Upon selecting the Option, you have to just select other table or file format to pop a below window, where you can perform the Profiling requirements.4.     Here we can provide the join functionality on respective columns from 1st table to 2nd table.

    1. We can manually make the Join relation or if the source/target table consists of Primary or Foreign Keys then, we can just click the Button “Propose Relation”, which helps to join automatically for respective columns.

5.     We can save all the columns data or specific Key mentioned columns data, which is mainly useful when we check the Profiled Data.


7.gif


6.     Once this is done, click “SUBMIT” button.

7.     Below is the Window, which shows the status of Profiling, weather it is a Relationship or Column based Profiling.


8.gif

8.     Once Profiling completed, we can go to designer & click view data on file or table.

9.     Here we find a tab (Relationship profile Tab) which shows the latest profiled information performed on table 1 over table 2. We can drill down on the parameter values to check the values in detail.

10.     Here, 66.67% of records signifies that, they are not present  in table – P2 but, available in

Table – P1.

11.     And ,  80.00% of records signifies that, they are not present in table – P1 but, available in

Table – P2.


9.gif

 

12)          We can even drill down the data that is Non-Matched upon clicking on the percentage values.


10.gif


Benefits of Data Profiling:

 

The benefits of data profiling is to improve data quality, shorten the implementation cycle of major projects, and improve understanding of data for the users. Discovering business knowledge embedded in data itself is one of the significant benefits derived from data profiling. Data profiling is one of the most effective technologies for improving data accuracy in corporate databases. Although data profiling is effective, then do remember to find a suitable balance and do not slip in to “analysis paralysis”.



SAP BODS - Custom Function to SPLIT Single Input Field Value to Multiple Fields with Meaningful Field Division

$
0
0

Custom Function - Word Split Mechanism


 

In BODS, we don't have any specific Logic or Transform to Split Words equally & meaningfully. So as per business requirement, we developed a Custom Function to provide Solution to User Requirement.

 

User Requirement:-  Input Field value to be divided into 3 Output Fields of each length - 60 varchar, using BODS Tool.

 

#################### Declaration ####################

 

#$Input = '';

 

$Input = replace_substr( $Input,'  ',' *');

 

$Input_Len = length(replace_substr( $Input,'  ',' *'));

 

$Incr = 1;

 

$Output1 = '';

$SubStr1 = '';

$SubStr2 = '';


######################################################

 

####################### Logic #########################

 

while ($Incr < $Input_Len)

 

                begin

 

                                $SubStr2 = word_ext( $Input,$Incr,'\' \'');

                                #print('1 - '||$SubStr2);

 

                                if((length($SubStr1)+length($SubStr2))>= 60)

                                begin

                                                                $Output1 = $SubStr1;

                                                                $T_Len = length(rtrim_blanks($Output1))+1;

                                                                $Text_Output = substr($Input,$T_Len,$Input_Len);

                                                                #print('2 - '||$Output1);

                                                                $Output1 = ltrim_blanks( rtrim_blanks( replace_substr( $Output1,' *','  ')));

 

                                                                return $Output1;

                                                                return $Text_Output;

                                                                #print($Text_Output);

                                end

                                else

                                begin

                                $SubStr1 = $SubStr1||' '||$SubStr2;

                                $SubStr1 = rtrim_blanks($SubStr1);

                                #print($SubStr1);

                                $Output1 = $SubStr1;

                                $Output1 = ltrim_blanks( rtrim_blanks( replace_substr( $Output1,' *','  ')));

                                #print('3 - '||$Output1);

                                end

                                                               

                                $Incr = $Incr+1;  

end

 

return $Output1;

return $Text_Output;


####################################################

SAP Business Objects Data Services Workbench 4.1 - Basics

$
0
0

SAP Business Objects Data Services Workbench 4.1


  1. 1.1       DS Introduction

 

SAP BusinessObjects Data Services delivers a single enterprise-class solution for data integration, Data quality, data profiling, and text data processing that allows you to integrate, transform, improve, and deliver trusted data to critical business processes. It provides one development UI, metadata repository, data connectivity layer, run-time environment, and management console—enabling IT Organizations to lower total cost of ownership and accelerate time to value. With SAP BusinessObjects Data Services, IT organizations can maximize operational efficiency with a single solution to improve Data quality and gain access to heterogeneous sources and applications

 

  1. 1.2       Data Services Workbench

 

The Data Services Workbench is an added application which provides a graphical user interface (GUI) development environment that simplifies data application logic to migrate data and database schema information between different databases in a data warehousing environment.

 

 

  1. 1.3       Key features of Workbench

 

  • Browsing table metadata and data.
  • Selecting individual source tables or multiple tables for migration.
  • Specifying the order in which the source tables should be migrated.
  • Adjusting the table schema in detail. For example, adding or removing columns, defining constraints, partitions, indexes, and so on.
  • Specifying filters and simple projection expressions.
  • Specifying source and target table options such as array fetch size and bulk-loading Options.
  • Executing a replication job as an initial load or delta load

 

1.4     Drawbacks in Previous DS

 

  • In previous versions, for migrating data and schema information, we required to create many dataflows in the Designer, with each dataflow reading, from a single source table and writing to a template/permanent target tables.
  • In addition, incompatibilities between the source and target database types could require manual schema and data corrections.

 

1.5     Added Advantages in Workbench

 

  • The Data Services Workbench automates migration process. Instead of creating many dataflows manually, we can now provide connection information for the source and target databases and select the tables that we want to migrate.
  • The Workbench automatically creates Data Services jobs, workflows & Datastores and imports them into a Data Services repository.
  • We can execute and monitor the jobs status from within the Workbench.
  • Workbench supports advanced options such as bulk loading and delta loading
  • Jobs created in the Workbench can be scheduled with the Data Services Management Console, and the generated objects can also be used as a starting point for further editing within the DataServices Designer.

       (For example, we might require adding more advanced transformations that are not available directly in the Workbench)

  • Workbench supports migration from Data Services-supported databases and SAP applications to SAP HANA, Sybase IQ, Sybase ASE, Oracle, Microsoft SQL Server, DB2, and Teradata targets.

 

 

1.6 Steps to proceed with Workbench

 

  1. Launch the Workbench.

  (The Workbench can be accessed from the Windows Start Menu: All Programs > SAP BusinessObjects Data Services 4.1 > Data Services Workbench.)

  1. Enter your user credentials for the CMS.
  2. Provide the details:
    • SystemName/ServerName
    • UserName                    -           The user name to use to log into the CMS.
    • Password                     -           The password to use to log into the CMS.
    • Authentication mode      -           The authentication type used by the CMS.
  3. Click Connect.

 

1.gif

 

Workbench can perform 2 types of Migration tasks:

 

  1. Quick Replication Migration
  2. Detail Replication Migration

 

 

  1. 1.7       Quick Replication Migration

 

Upon selecting Start a Replication Wizard after login we can perform Quick Replication Migration Process.

 

2.gif

 

 

Below are the steps to be performed to achieve the results.

 

Provide -          Project Name


3.gif


Provide -         Source database details


4.gif


Selection -       Source database tables

5.gif


Provide -         Target database details


6.gif


Selection -       Execution Properties


7.gif


Final Status & Report


8.gif


1.8       Advantages & Usability of Quick Replication Process:

 

  • Useable when 1:1 Source to Target mapping & without any transformation involved.
  • N – No. of tables can be replicated at a single run.
  • Later point if any modifications like filters or conditions upon tables can be implemented.

 

 

1.9       Detail Replication Migration

 

Here over project level, we can go & change the properties alter/add the logic.

This is similar to the concept of QUERY Transform in the Designer, which allows us to apply some conditions or filters over data & most of the functions related to conversions, date, database, Aggregate, Lookup, Math functions & so on can be applied directly from the workbench.

Apart from this, Workbench has best feature like creating/deleting index over the table from the GUI itself.


9.gif


1.10    Functions Library


10.gif


1.11    Data Analysis Feature in workbench

 

Analysis over source or target table data can be directly performed over workbench.

Features available to perform Charts or tables.


11.gif


  1. 1.12    Delta Load Mechanism using Workbench

 

Delta load jobs move rows that have been added or modified since the last time the job was executed. There are many reasons to implement a delta load job, but the most common is to reduce the time, the loading process takes.

Instead of loading millions of rows each time the job is run, we can process only the few that have changed.

Another reason, to maintain historical data; we might require to keep the old data in our data warehouse and add the current state data, so that we can see the changes over time in data.

 

1.13    Scenarios over Delta Load types & configuration


Replication Behavior– Schema & data

Table Level ->Delta Load -> Select - No Delta Load

JobServer Properties->Select - Initial Load.

This will create a job consisting of 3 workflows.

   1. Drop the existing schema/table & create the table again.

   2. Load data into above created table. (Default Loading type = Append.)

   3. Create a table Load_Status for storing the job status & insert a new record with job Last Run End Date, if the table already exists, delete the old record & insert New Record.

 

Replication Behavior– Schema & data

Table Level ->Delta Load -> Select - No Delta Load

JobServer Properties->Select – Delta  Load.

This requires a Delta Load job already existing in the REPO.

 

Replication Behavior– Schema & data

Table Level ->Delta Load -> Select – Reload the full table

JobServer Properties->Select - Initial Load.

This will create a job consisting of 3 workflows.

  1. Drop the existing schema/table & create the table again.
  2. Load data into above created table. (Default Loading type = Append.)
  3. Create a table Load_Status for storing the job status & insert a New record with job Last Run End Date, if the table already exists, delete the old record & insert New Record.


Replication Behavior– Schema & data

Table Level ->Delta Load -> Select – Reload the full table

JobServer Properties->Select - Delta Load.

This will create a 2 jobs each consisting of 3 workflows.

Job1Initial Load

  1. Drop the existing schema/table & create the table again.
  2. Load data into above created table. (Default Loading type = Append.)
  3. Create a table Load_Status for storing the job status & insert a New record with job Last Run End Date, if the table already exists, delete the old record & insert New Record.

  Job2Delta Load

  1. Select LAST_RUN Date from table LOAD_STATUS.
  2. Load data into above created table. (Default Loading type = Truncate.)
  3. Updates the old run record with New END_DATETIME in table LOAD_STATUS.

 

Replication Behavior– Schema & data

Table Level ->Delta Load -> Select – Use Timestamp or Date Column

JobServer Properties->Select - Initial Load.

This will create 2 jobs each consisting of 3 workflows, where as it will Execute only Initial Load job.

Job1Initial Load

  1. Drop the existing schema/table & create the table again.
  2. Load data into above created table. (Default Loading type = Append
  3. Create a table Load_Status for storing the job status & insert a New record with job Last Run End Date, if the table already exists, delete the old record & insert New Record.

  Job2Delta Load

  1. Select LAST_RUN Date from table LOAD_STATUS.
  2. Load data into above created table. (Loading type = Truncate/Append)

  ([Table Column Date] >   $START_DATETIME) AND

([Table Column Date] <= $END_DATETIME))

  1. Updates the old run record with New END_DATETIME in table LOAD_STATUS.

 

Replication Behavior– Schema & data

Table Level ->Delta Load -> Select – Use Timestamp or Date Column without providing END_TIME.

JobServer Properties->Select - Delta LoadAndLoading type = Append

This will create a 2 jobs each consisting of 3 workflows in REPO, where as it will Execute only DELTA job.

Job1Initial Load

  1. Drop the existing schema/table & create the table again.
  2. Load data into above created table. (Default Loading type = Append
  3. Create a table Load_Status for storing the job status & insert a new record with job Last Run End Date, if the table already exists, delete the old record & insert New Record.

  Job2Delta Load

  1. Select LAST_RUN Date from table LOAD_STATUS.
  2. Load data into above created table. (Loading type = Truncate/Append)

  ([Table Column Date] >   $START_DATETIME) AND

([Table Column Date] <= $END_DATETIME))

  1. Updates the old run record with New END_DATETIME in table LOAD_STATUS.

 

Replication Behavior– Schema & data

Table Level ->Delta Load -> Select – Use Timestamp or Date Column with providing END_TIME.

JobServer Properties->Select - Delta LoadAndLoading type = Truncate

This will create a 2 jobs each consisting of 3 workflows in REPO, where as it will Execute only DELTA job.

Job1Initial Load

  1. Drop the existing schema/table & create the table again.
  2. Load data into above created table. (Default Loading type = Append
  3. Create a table Load_Status for storing the job status & insert a new record with job Last Run End Date, if the table already exists, delete the old record & insert New Record.

  Job2Delta Load

  1. Select LAST_RUN Date from table LOAD_STATUS.
  2. Load data into above created table. (Loading type = Truncate/Append)

  ([Table Column Date] >   $START_DATETIME) AND

([Table Column Date] <= $END_DATETIME))

  1. Updates the old run record with New END_DATETIME in table LOAD_STATUS.

 

 

1.14    Monitoring editor

 

The Workbench monitoring editor opens in the workspace when we click the icon in the toolbar or select it from the Tools menu.

 

The monitoring editor consists of several areas:

•           Job selection toolbar:  Displays the Data Services jobs available in the repository to which the Workbench is connected and allows us to execute them.

•           Execution history pane:   Displays the execution history for the selected job. Each line indicates the current status, execution date, and duration of a single job execution instance.

•           Execution history dashboard pane:   Displays a graphical representation of the history for the selected job. We can change the type of chart by choosing from the drop-down box in the upper right corner of the pane.

•           Execution details pane:  Displays the details for the individual job execution instance selected in the execution history pane.

 

We can view the trace, monitor, and error logs for the execution instance, as well as the execution statistics.

12.gif



server.start() rfc_bad_connection when trying to start RFC Instance in BODS Management Console

$
0
0

Symptom: After adding a RCF Server Interface in BODS Management Console when we try to start the RFC Instance the following error occurs

 

An internal error occurred server.start() rfc_bad_connection

 

 

Environment: SAP BASIS Version 7.0

SAP BW Version: 7.0

SAP BODS Version: 4.2 SP4

 

Cause: When we configure the connection parameters in Management Console on RFC Server Interface, commonly it is recommended that parameter SAP Gateway Service Name is SAPGW00 but the SAP Gateway Service Name is not resolved correctly by the RFC Server Interface.

 

Resolution: The SAP Gateway Service was given as SAPGW02. Try to put 3302 as the SAP Gateway Service Name. The last two numbers are your SAP System Number. E.g If your SAP System Number is 02 then SAP Gateway Service Name would be 3302

 

SAP BODS - Multiple Files Generation with STATIC predefined Format in Text File.

$
0
0


SAP BODS - Multiple Files Generation with STATIC pre-defined Format in Text File.

 

 

 

Business Requirement:

Generate Multiple Output Text files separately for Each Country from a single Input file.

 

 

Constraints:

  1. Before Generating Multiple Country wise files, BODS has to create a FOLDER with file-name format on run-time.
  2. Output Text file has a specified format, into which data has to be generated on the basis of country.
  3. BODS Job has to be running every 12 Min & search for the File in Input Folder, if not found wait spooning for few minutes & abort the job.
  4. There might be multiple Input files, So, need to create a BATCH wise load mechanism.
  5. Already processed files should be ignored & job has to be terminated.

 

Example: Each Output file should have below HEADER Starting Lines under which data has be loaded with proper line-spacing.

 

 

Requirement File Format:- 1

Table:          MAEX

Displayed Fields:   8 of   8 Fixed Columns: 4  List Width 0250

------------------------------------------------------------------------------------------

| |MANDT|MATNR               |ALAND|GEGRU|ALNUM           |EMBGR           |PMAST|SECGK  |

-----------------------------------------------------------------------------------------------------------------------------

 

Requirement File Format:- 2

 

Table:          MARC                                            

Displayed Fields:   4 of   4 Fixed Columns:3  List Width 0500  

---------------------------------------------------------       

| |MANDT|MATNR |WERKS |STAWN              |       

  -------------------------------------------------------------------------       



Maintained a READ-ME Document for USERS to follow the INPUT files placing:

 

  1. Place file using GTS_HTS_ECCN_LOAD_TEMPLATE.xlsx format with a filename that BEGINS with "HTSECCN_RTP_".  Sheet name should be "Sheet1".

   For example:

                                HTSECCN_RTP_2013_03_27.xlsx

                                HTSECCN_RTP_a_few_new_IP_materials.xlsx

 

Note: Don't use SPACES or PERIODS in the input filename.

 

  1. A Data Services job will be scheduled to scan for these files.  Upon finding one, it will process the file and create an output directory with the following format:

                OUT_<Timestamp>_<Filename>

                For Example:

                                OUT_20130327071422_HTSECCN_RTP_2013_03_27

                                OUT_20130327071422_HTSECCN_RTP_a_few_new_IP_materials

 

Note: The job checks every 5 minutes for new data.

In the directory, the job will place all non-blank HTS and ECCN files that can be generated from the input file.  It will also move the input file to this directory for confirmation that it was processed.

for %a in (*) do find /v /c "~@!@#(*$Q" %a

 

 

Output:-


Before generating Files, separate folder has to be created via BODS & generate files inside the folder as below.

1.gif

 

Output File should look like -

 

Requirement File Format:- 1

Table:          MAEX

Displayed Fields:   8 of 8  Fixed Columns:                 4  List Width 0250

------------------------------------------------------------------------------------------

| |MANDT|MATNR               |ALAND|GEGRU|ALNUM           |EMBGR           |PMAST|SECGK  |

-----------------------------------------------------------------------------------------------------------------------------

| |110  |8001316.000000      |US |EA   |EAR99           |                |     | |

| |110  |8004904.000000      |US |EA   |EAR99           |                |     | |

| |110  |9000020.000000      |US |EA   |EAR99           |                |     | |

 

Requirement File Format:- 2

 

Table:          MARC                                            

Displayed Fields:   4 of 4  Fixed Columns:3  List Width 0500  

---------------------------------------------------------       

| |MANDT|MATNR               |WERKS |STAWN              |       

-------------------------------------------------------------------------       

| |200  |8001316.000000      |0102 |90318000           |

| |200  |8004904.000000      |0102 |90318000           |

 

 

 

Solution Implemented:-


Maintained Header Format Folder, where STATICHEADER format files are stored in TEXT file format. Which is used by BODS to read the STATIC HEADER into a Input File with DELETE before Load & then, this Input File is again used a TARGET file in next flow to load the actual source input excel file data as an append mechanism without Enabling DELETE Option.

2.gif

 

###########################  Variables DeclarationRequired inside the Job as per requirement to satisfy the Logic.###########################

 

 

$G_Error= 'N';

$G_Date = to_char(sysdate(),'yyyymmdd');

Print($G_Date);

$G_Email_Message = ''; # will be assigned in next flows.

$G_Email_Header = '';  # will be assigned in next flows.

$G_Email_Recepients = 'DL-IT-BODS-Developers@xyz.com';

 

$HTSMANDT = '200  '; # Always keep 2 blank spaces after the 3 digit number.

$HTSWERKS = '0102  '; # Always keep 2 blank spaces after the 4 digit number.

$ECCNMANDT = '110  '; # Always keep 2 blank spaces after the 3 digit number.

 

$SRC_HeaderFilePath = '\\\ xyz.com\shared\CA002\IT\GTS\HTS-ECCN_Watch_Folder\HTS-ECCN_Header_Formats';

$SourceFilePath = '\\\ xyz.com\shared\CA002\IT\GTS\HTS-ECCN_Watch_Folder';

 

$SourceFileName = '';  # will be assigned in next flows.

$TargetFilePath = '';  # will be assigned in next flows.

$TargetFileName = '';  # will be assigned in next flows.

$GV_FILENAME = '';     # will be assigned in next flows.

$GV_FILENAME_KEY = 0;  # will be assigned in next flows.

$GV_FolderName = '';   # will be assigned in next flows.

$KEY = 0;              # will be assigned in next flows

$FLAG = 0;             # will be assigned in next flows

$GV_STATUSFLAG = '';   # will be assigned in next flows

$GV_Query = null;      # will be assigned in next flows

$GV_UpdateQuery = '';  # will be assigned in next flows

$GV_Curr_Batch_No = 0; # will be assigned in next flows

$GV_Batch_No = 0;      # will be assigned in next flows

$GV_FN = null;         # will be assigned in next flows

 

 

Spooling Mechanism:


$Flag = wait_for_file( '\\\ xyz.com\shared\CA002\IT\GTS\HTS-ECCN_Watch_Folder\HTSECCN_RTP_*.xlsx',600000,300000,5);

 

Once the BODS Job receives input source file in required input format, before processing file data, i have used Command functionality calling from BODS Script  to capture all filenames to a Temporary File for BATCH processing to run sequentially.

 

Print('File Exists');

EXEC ('cmd.exe','DIR /B "\\\xyz.com\shared\CA002\IT\GTS\HTS-ECCN_Watch_Folder\HTSECCN_RTP_*.xlsx" > \\\xyz.com\shared\CA002\IT\GTS\HTS-ECCN_Watch_Folder\HTS-ECCN_Header_Formats\HTS_ECCN_FileList.txt',8);

 

Print('File Names copied to - HTS_ECCN_FileList.txt');

3.gif

Load the FileNames into a Staging Landing Table for iteration process.

 

4.gif

$GV_FILENAME_KEY = sql('BODS_APPLICATIONS','select max("ID") from DBO.GTS_FILELIST_LNDG');

Print('File Number Processing - '||$GV_FILENAME_KEY);

$GV_FILENAME = sql('BODS_APPLICATIONS','select distinct FILENAME from DBO.GTS_FILELIST_LNDG where "ID"=[$GV_FILENAME_KEY]');

 

Print('Processing File - '||$GV_FILENAME);

 

 

Select 1 file at any point of time & before processing data compare with BATCH table to check if that file is New or Already processed. If exists - Terminate the Job.

 

 

$GV_Query = 'select distinct FILENAME from dbo.GTS_FILELIST_LNDG where FILENAME in (select distinct FILENAME from dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment where status in(\'COMP\',\'AVAL\')) and "ID"=[$GV_FILENAME_KEY]';

$GV_FN = sql('BODS_APPLICATIONS',$GV_Query);

 

print('If any existing files - '||$GV_FN);

 

$GV_Query = 'select distinct STATUS from dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment where FILENAME in (\'[$GV_FN]\')';

$GV_STATUSFLAG = sql('BODS_APPLICATIONS',$GV_Query);

print('$GV_STATUSFLAG : {$GV_STATUSFLAG}');

 

if ($GV_FN is null)

begin

$G_Error= 'N';

$GV_Query = 'Select Max(batch_no) from dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment';

$GV_Curr_Batch_No = sql('BODS_APPLICATIONS',$GV_Query);

 

   if($GV_Curr_Batch_No = 0 or $GV_Curr_Batch_No is null)

   begin

                $GV_Query = 'Insert into dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment values (1,\''||$GV_FILENAME||'\',\'AVAL\',\'\')';

                print('$GV_Query --->'||$GV_Query);

                sql('BODS_APPLICATIONS',$GV_Query);

                $GV_Batch_No = 1;

                print('$GV_Batch_No --->' ||$GV_Batch_No);

   end

   else

   begin

                $GV_Batch_No = $GV_Curr_Batch_No + 1;

                $GV_Query = 'insert into dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment values ('||$GV_Batch_No||',\''||$GV_FILENAME||'\',\'AVAL\',\'\')';

                print('$GV_Query --->'||$GV_Query);

                sql('BODS_APPLICATIONS',$GV_Query);           

                print('$GV_Batch_No --->' ||$GV_Batch_No);

   end

end

else

begin

$G_Error= 'Y';

 

$GV_Query = 'select distinct Batch_No from dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment where FILENAME in (\'[$GV_FN]\')';

$GV_Batch_No = sql('BODS_APPLICATIONS',$GV_Query);

 

print('$GV_Batch_No : {$GV_Batch_No}');

end

 

5.gif

 

 

$G_Error= 'N';

$GV_FILENAME = sql('BODS_APPLICATIONS','select distinct FILENAME from dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment where "BATCH_NO"=[$GV_Batch_No]');

 

Print('Extraction started for File  - '||$GV_FILENAME);

 

Print('Creating todays Folder ...');

 

#$GV_FolderName = 'OUT'||'_'||to_char(sysdate(),'YYYYMMDD')||''||replace_substr( to_char(sysdate(),'hh24:mi:ss'),':','')||'_'||word_ext($GV_FILENAME,1,'.xlsx');

 

$GV_FolderName = 'OUT'||'_'||to_char(sysdate(),'YYYYMMDD')||''||replace_substr(to_char(sysdate(),'hh24:mi:ss'),':','')||'_'||$GV_FILENAME;

 

$GV_FolderName = replace_substr( $GV_FolderName,'.xlsx','');

 

Print('Extraction started for folder name  - '||$GV_FolderName);

 

EXEC ('cmd.exe','MD [$SourceFilePath]\[$GV_FolderName]',8);

 

Print('Folder created -  '||$GV_FolderName);

$TargetFilePath = '[$SourceFilePath]\[$GV_FolderName]';

$SourceFileName = $GV_FILENAME;

6.gif

 

7.gif

$KEY = sql('BODS_APPLICATIONS','select max("KEY") from DBO.GTS_HTS_ECCN_CTRY_CODES');

Print($KEY);

$G_Count = sql('BODS_APPLICATIONS','select Count(*) from DBO.GTS_HTS_ECCN_CTRY_CODES');

 

$GTS_TYPE = sql('BODS_APPLICATIONS','select distinct GTS_TYPE from DBO.GTS_HTS_ECCN_CTRY_CODES where "KEY"=[$KEY]');

 

Print('Extraction started for GTS Type - '||$GTS_TYPE);

 

$CTRY_CODE = sql('BODS_APPLICATIONS','select CTRY_CODE from DBO.GTS_HTS_ECCN_CTRY_CODES where "KEY"=[$KEY]');

 

Print('Extraction started for Country - '||$CTRY_CODE);

 

8.gif

 

$TargetFileName = ('Product'||'_'||$GTS_TYPE||'_'||'Input'||'_'||$CTRY_CODE||'_'||$G_Date||'.txt');

 

Print($TargetFileName);

 

9.gif

10.gif

 

Print('File Generated - '||$TargetFileName);

$KEY = $KEY - 1;

 

Once the File is completely Processed, Input File is ARCHIVED with below process & update the BATCH Assignment Table with status of file processed.

 

If($G_Error = 'N')

begin

 

Print('GTS_GlobalTrade Conversion Job has Completed for File -- '||$GV_FILENAME);

Print('Moving todays Input File to Folder ... - '||$GV_FolderName);

 

EXEC ('cmd.exe','Move [$SourceFilePath]\[$GV_FILENAME] "[$SourceFilePath]\[$GV_FolderName]\"',8);

 

Print('Input File Moved Successfully -- '||$GV_FILENAME);

 

$GV_UpdateQuery = 'update dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment set STATUS =  \'COMP\' where BATCH_NO = [$GV_Batch_No]';

sql('BODS_APPLICATIONS',$GV_UpdateQuery);

 

$GV_UpdateQuery = 'update dbo.GTS_TRADEDATA_HTS_ECCN_Batch_Assignment set LoadDate =  getdate() where BATCH_NO = [$GV_Batch_No]';

sql('BODS_APPLICATIONS',$GV_UpdateQuery);

 

end

Data Services Best Practices Job Design Tips

$
0
0

Job Design Tips

Job and Dataflow Design

A rule of thumb is one data flow per target table. Try to divide your Data Services application into components with minimal inter-dependencies. This modular design will create a simpler architecture and make maintenance easier. It can also help performance tuning because different components can be run independently.

 

Bulk Loading

Bulk loading can be utilized to feed a large amount of data quickly. If there are both updates and inserts in the same data flow, they need to be separated. Do updates first, and then drop all indexes on the target table and do the bulk load of all new rows. Then rebuild indexes. Bulk loading to Sybase ASE databases is supported through Sybase ASE bulk copy utility.

 

Lookup or Use join

The advantage of using Data Services lookup functions is that they always return exactly one row. The disadvantage is it is done in Data Services therefore needs a round-trip communication between Data Services server and database server although using caching can significantly reduce the overhead. Joins can be pushed down to the database server but they may return more than one row.

 

Caching Options in Lookup Function

Caching depends on the physical memory, the size of the data set, and the application usage pattern. If the size of the lookup table is too large to fit in the physical memory, on demand caching is recommended because the Data Services will only load those values that are actually used. If the lookup table is small, you can use the pre-load caching option. If the input data set is large and look up values are relatively small, then the “DEMAND_LOAD_CACHE” option is more efficient.


Windows Authentication vs. Database Authentication

Windows Authentication is only available when the MS SQL Server is used for the repository or data store connection type. When selected, the MS SQL Server will validate the user login name and password using the information from the Windows operating system.

Database authentication can be used for any type of RDBMS connection type including MS SQL Server, Oracle, DB2, etc. This mode authenticates the user login information against the user account on the database.

 

Exception Handling

Data Services uses two special transforms, Try and Catch, to trap exceptions at runtime. When there is error or exception raised in the code block surrounded by the Try Catch block, the control is passed to the Catch, where error handling code usually resides.


Key_Generation Transformer

This transformer is used to generate surrogate keys for a target table. It is very efficient because it only reads the maximum value for the key column in the target table once before it generates keys. However, there is an important side effect you must aware – it is not synchronized among data flows. Do not use it in more than one data flow feeding into the same target table.

 

1.PNG 

It is OK when only one Key_Generation transformer is used for a target table.

 

  2.PNG 

Incorrect use of Key_Generation

 

When two Key_Generation transformers are used to populate the same target table, there could be duplicate keys generated in the Data Services process. As a result, either duplicate keys are loaded into the target table if there is no constraint on the key column, or the load fails because of the “unique key” constraint violation.

 

Scheduling Capabilities

Data Services jobs can be executed or scheduled with greater flexibility than previous versions. You can schedule DS jobs in BOE. Once the scheduling object is created in the BOE, it can be modified and scheduled with all options in BOE CMC (Content Management Console – a standard BOE administration console). In CMC, schedule events can be created to kick off a job based on the completion status (success or failure) of another job.

Secondly, file based scheduling is available by using the function called “wait_for_file”. With this function, it is possible to start a job based on the existence of a file in a local folder on the Data Services Job server. The function can be called in a script as the first step in a job so the execution of the rest depends on whether or not a file exists in a folder.

Thirdly, jobs can also be exported to SAP BW and executed and scheduled by using an InfoPackage.

 

Remove Empty Rows from Excel Input Files

The Excel source files may contain white spaces in some cells. Without filtering, rows with only empty cells generate all-NULL meaningless rows in the target table. One way to deal with this issue is to ask the business users to clean it up before sending it to Data Services. While it solves the problem, this approach adds quite some overhead to the pre-processing of Excel files.

A better way to address the issue is to create additional logic in the Data Services dataflow to exclude all NULL rows before loading the target table. For example, let’s assume that the there are three input columns, column_1, column_2, and column_3 that are loaded from Excel. By adding the following logic in the Where clause, empty rows can be eliminated.

 

NOT (column_1 IS NULL AND column_2 IS NULL AND column_3 IS NULL)

Data Services Best Practices Code Management

$
0
0

Code Management

Multiple Profiles, not Multiple Objects

When naming objects in a multi-user development environment, a developer can be tempted to give names as distinctive as possible. While it is important to avoid name clashes in a team environment, some objects should be given generic names to avoid difficulties when merging development from multiple users or migrating to a different environment. Data store is such an object. For example, developer Joe and Jane work on the same source database. If Joe names his data store “DS_Joe_Source”, and Jane names her data store  “DS_Jane_Source”, there will be two data store objects in the central repository pointing to the exactly same database source. Another example is if a developer names a data store DS_DEV_Source, the name has to be changed to avoid confusion when migrating to the test environment.

Data Services uses profiles to solve this kind of problems. An object can have multiple profiles attached to it. Let’s revisit the above examples using profiles. In the repository, there is only one data store object created, DS_Source. DS_Source has multiple profiles defined on it, such as profile_DEV_Joe, profile_DEV_Jane, profile_TEST, etc. Applying a profile will quickly change its configuration.

 

Global Variables and Substitution Parameters

Global variables are variables defined at the job level. They can be used as place holders for external configurable parameters passed into the job such as directory path, default values, etc. Their values can be read and modified anywhere in the job. However, they are not visible outside the job.

1.PNG

    Global Variables

 

To define a variable that can be shared by multiple jobs, one must use a substitution parameter. Substitution parameters are defined at the local repository level and thus are available to all jobs in the same repository. You should not modify the pre-defined DS parameters for your specific application since this may impact all jobs using those parameters.

2.PNG                          

    Substitution Parameters

 

Repository Management

Try to create a local repository for each developer. The overhead with a single repository is pretty minimal given the capacity of modern database systems and disk size. Define a connection to a central repository for each local one. Developers should not log into the central repository for any development work. Code should be developed and unit tested in the local repository and checked into the central repository.

A separate local repository should be created for each testing or production Data Services environment. Moving code into such a repository should be coordinated by a Data Services administrator or team lead, ideally after a code review process to minimize the chance of breaking existing code.

 

Check Out Objects with Filtering

When checking out, use the option “with filtering”. This give you the control of what objects you want to include in the check out. For example, if you already configured datastores in your local repository, you probably don’t want them replaced with wrong configuration. If you check out some objects that you do not intend to, you can always undo the checkout on those objects.

 

Label Objects with Meaningful Text

Labeling is optional but it is critical in code migration. Without a label, it would be very difficult to find the correct versions of all objects ready to be deployed into production because they are not necessarily always the most recent version. The label for a release should include the project name, release number, and a date stamp.

When you are still developing code, it is up to you to add a label or not. However, it is a good practice to add a meaningful description to the objects when there is a major change. For example, if you just add a new feature and unit test it, you can add a label with something like “New feature xxx added on 10-04-2015”.

Data Services Best Practices Migration Strategies

$
0
0

Migration Strategies


Data Services supports two methods to move code in the development cycle: export/import and using central repository. Export/import method can be used for a small project, while central repository is better suited for a medium to large project team.


1. Migration Using Export/Import

Export/Import is the basic method Data Services supports to move code between local repositories. Export Editor is used to exclude, or include objects in the export process. The target can be either another local repository with the same version number, or a flat file.

When you import a file, the objects with the same name in the destination repository will be overwritten. It is recommended that you keep a backup copy before importing.

 

2. Migration Using a Central Repository

In a typical development lifecycle, applications move among several different phases including development, testing, and production. In a migration, application code is copied and new configuration profiles are applied. A single central repository can support the application during these phases with job labeling and projects.

The following diagram shows the central repository as a source control tool during the development phase, and a migration tool to move applications among different phases in the lifecycle.

1.PNG

    Workflow when using a central repository for migration

 

Labeling should not be confused with object versioning. The version information of an object is managed by the central repository as an integer number, and automatically increased whenever an updated version is checked in. On the other hand, labels are descriptive text user enters into the central repository during the development phase usually to indicate a milestone is reached, such as a release is ready for testing. The same label can be defined on different versions of a group of objects (such as objects belong to a project) so that they can be retrieved as a single one. The following example taken from the Data Services technical manual can illustrates the difference between versioning and labeling.

For example, suppose a job is added to the central repository where user 1 works on a work flow in the job, and user 2 works on a data flow in the same job. At the end of the week, after user 1 checks in two versions of the work flow and user 2 checks in four versions of the data flow into the central repository, the job is labeled “End of week 1 status.” This label contains version 1 of the job, version 2 of the work flow, and version 4 of the data flow. Later, user 1 and user 2 continue to change their respective work flow and data flow.

 

At some later point, if you want to get the job with the version of the data flow with this label, getting the job by its label accomplishes this, whereas checking out the job and its dependents does not.

 

The label “End of week 1 status” serves the purpose of collecting the versions of the work flow and data flow that were checked in at the end of the week. Without this label, you would have to get a particular version of each object in order to reassemble the collection of objects labeled “End of week 1 status.”

It is a good practice to follow a naming convention for labels. One recommended way is to define each label in the following format:

<project name>_rel_<number>_<date in ‘yyyymmdd’>

For example, for project “dw project 1”, you can label the first baseline of the project as “dw project 1_rel_1.1.1_20150928”. Here a three digit release number is used to indicate major, minor, and maintenance release numbers. You can use your own numbering system to keep track each release.

For more detailed discussions of labeling, refer to Chapter 8, “Working in a multi-user environment”, in the “Data Services Advanced Development and Migration Guide”.

 

Moving a Local Repository to a New Database Server

The following steps are recommended to move a local repository to a new database server:

  1. Create a local repository on the new database server with Repository Manager;
  2. Export the old repository to a file and import it into the new repository;
  3. Use the Server Manager to delete the old repository entry, and then add the new one.

Data Services Best Practices Code Naming Conventions

$
0
0

Code Naming Conventions

The Data Services objects should be named to make it as easy as possible to identify their function and object type. The following table contains the available objects in Data Services and their proposed naming convention.


Object

Naming Convention

Project

PJ<Project Subject Area>_Batch

Job (Batch/Real-time)

JB_<RICEFID>_<name>

Work Flow

WF_<RICEFID>_<name>

Data Flow

DF_<RICEFID>_<SubjectArea Segment>_<Action>

Annotation

ANN_<RICEFID>_description

Catch

CATCH_<RICEFID>_description

Conditional

COND_<RICEFID>_description

Datastore

DS_<SOURCETYPE._<SOURCENAME>

Document

DOC_<RICEFID>_description

DTD (Document Type Definition)

DTD_<RICEFID>_description

File Format

FF_<RICEFID>_<ACTION>_DESCRIPTION

Function

FUNC_<description>

Script

SC_<RICEFID>_description

Template Table

TT_<RICEFID>_description

Try

TRY__<RICEFID>_description

While Loop

While_description

XML Message

XMLMSG_<RICEFID>_description

XML Schema

XMLSCH_<RICEFID>_description

XML Template

XMLTMPL_<RICEFID>_description

Address_Enhancement Transform

AE_<RICEFID>_description

Case Transform

Case_description

Date_Generation Transform

DG_<RICEFID>_description

Effective_Date Transform

ED_<RICEFID>_description

Hierarchy_Flattening Transform

HF_<RICEFID>_description

History_Preserving Transform

HP_<RICEFID>_description

Key_Generation Transform

KG_<RICEFID>_description

Map_CDC_Operation Transform

MC_<RICEFID>_description

Map_Operation Transform

MO_<RICEFID>_description

Match_Merge

MM_<RICEFID>_description

Merge

MER_<RICEFID>_description

Name_Parsing

NP_<RICEFID>_description

Pivot

PIV__<RICEFID>_description

Reverse Pivot

RPIV_<RICEFID>_description

Query

QRY_<RICEFID>_description

Row_Generation

RG_<RICEFID>_description

SQL

SQL_<RICEFID>_description

Table_Comparison

TC_<RICEFID>_description

Other Naming Standards

The naming standards for other components which may be involved in the BOBJ DS process are below.

  • Global Variables should be named with a prefix of “$GV_”.
  • Local Variables should be named with a prefix of “$V_”

 

Project

Format:  PJ_<Project Subject Area>_Batch

 

Example: PJ_PANGAEA-PHASE1-CONV_BATCH

Projects are the highest level of reusable object in Data Services.  They allow you to group jobs that have dependent schedules or belong to the same application.  DESCRIPTION is the general application. PJ is an abbreviation for Project. Batch suffix signifies that all ETL jobs in this conversion project execute in BATCH mode.

 

Job

Format: JB_<RICEFID>_TARGETSCHEMANAME

Example: JB_CV50001_SALES_LOAD

A job is a group of objects that you can schedule and execute together.  Multiple jobs can be included in a single project.  When naming a job, include a description of data managed within the job.  In the example above, the job loads sales data to a data mart.  JB is an abbreviation for Job.  When designing jobs, you may want to limit the scope of the job to a particular area of your business, so that you can easily understand the purpose of the job from the name.  Keeping the jobs limited in space also makes it easier to re-use the jobs in other projects.

 

Datastore

Format: DS_<SOURCETYPE._<SOURCENAME>

Example:  DS_SAP_VR2

A Datastore provides connection information for a source or target database.  Data Services is able to import metadata through the data store connection.  The SOURCETYPE could SAP, TD for Teradata, ORCL for ORACLE and so on. SOURCENAME could be the schema name of the database or just a descriptive name for the source or target database.  DS stands for datastore.

 

Workflow

Format: WF_<RICEFID>_TARGETTABLE

Example: WF_CV50002_SALES_DIM

               WF_CV50007_CUSTOMER

Workflows can contain other workflows are well as dataflows.  TARGETTABLE refers to the final table in the workflow that is being loaded.  In instances where a workflow contains multiple workflows, it can refer to a category or group of tables.  In the examples, WF_CV50002_SALES_DIM hold all the dimension table loads for the SALES data mart application.  One of the workflows is the WF_CV50007_CUSTOMER, which loads only customer data.  WF stands for workflow.

 

Dataflow

Format: DF_<RICEFID>__<SubjectArea Segment>_<Action>

Example: DF_CV50007_CUSTOMER_EXTRACTFILE

               DF_CV50007_CUSTOMER_TRANSFORM

               DF_CV50007_CUSTOMER_OUTPUTFILE

                       

Dataflows are the lowest granularity objects in Data Services.  Multiple dataflows can reside in a workflow.  Dataflows hold the specific query or transformation that is being applied to the source data.  TARGETTABLE refers to the specific table that the dataflow is loading.  DF stands for dataflow.  In the example, the Customer table data is extracted from file, data is transformed in the 2nd Data flow. In the 3rd Data flow, Customer Ouput File is generated for further processing.

 

Scripts

Format: SC_<RICEFID>__DESCRIPTION

Example: SC_CV50007_CLEANUP_TEMP_TABLES

Scripts can be created in Jobs or Workflows.  They are singe-use objects and are not stored in the Data Services object library. They can be used to declare and assign variables, call a SQL function or perform SQL statements.  DESCRIPTION can refer to the function or task that the script is performing or the name of a table that the script is loading.  SC stands for script.  Note: Dscriptive Names could include (SQL update table X, Print Message, etc.)  This example describes a script that truncates the temp tables for the job.

 

Flat Files

Format: FF_<RICEFID>_<ACTION>_DESCRIPTION

Example: FF_CV50009_OUT_ORA_APPS_SALES_DETAIL

Flat files are data sources that are not in database table form.  They are typically used in dataflows to create tables or as a source of metadata.  The example describes the target table that the file will populate from the Oracle application.  FF stands for flat file.

 

Tables

Format:  DESCRIPTION

Example:  KNA1

Tables can be used as a source or target in a dataflow.  DESCRIPTION refers to the data content in table.  The example describes the database table that will house CUSTOMER data.

SAP Data Services Strategies to execute jobs

$
0
0

Maximizing push-down operations to the database server

SAP BusinessObjects Data Services generates SQL SELECT statements to retrieve the data from

source databases. The software automatically distributes the processing workload by pushing down

as much as possible to the source database server.

Pushing down operations provides the following advantages:

  • Use the power of the database server to execute SELECT operations (such as joins, Group By, and common functions such as decode and string functions). Often the database is optimized for these operations.
  • Minimize the amount of data sent over the network. Fewer rows can be retrieved when the SQL statements include filters or aggregations. You can also do a full push down from the source to the target, which means the software sends SQL INSERT INTO... SELECT statements to the target database. The following features enable a full push down:
  • Data_Transfer transform
  • Database links and linked datastores

 

Improving throughput

Use the following features to improve throughput:

  • Using caches for faster access to data

You can improve the performance of data transformations by caching as much data as possible. By

caching data in memory, you limit the number of times the system must access the database.

  • Bulk loading to the target

The software supports database bulk loading engines including the Oracle bulk load API. You can

have multiple bulk load processes running in parallel.

  • Other tuning techniques
    • Source-based performance options
      • Join ordering
      • Minimizing extracted data
      • Using array fetch size
      • Target-based performance options
        • Loading method
        • Rows per commit
        • Job design performance options
          • Loading only changed data
          • Minimizing data type conversion
          • Minimizing locale conversion
          • Precision in operations

   

Using advanced tuning options

If your jobs have CPU-intensive and memory-intensive operations, you can use the following advanced

tuning features to improve performance:

  • Parallel processes—Individual work flows and data flows can execute in parallel if you do not connect them in the Designer workspace.
  • Parallel threads—The software supports partitioned source tables, partitioned target tables, and degree of parallelism. These options allow you to control the number of instances for a source, target, and transform that can run in parallel within a data flow. Each instance runs as a separate thread and can run on a separate CPU.
  • Server groups and distribution levels—You can group Job Servers on different computers into a logical component called a server group. A server group automatically measures resource availability on each Job Server in the group and distributes scheduled batch jobs to the computer with the lightest load at runtime. This functionality also provides a hot backup method. If one Job Server in a server group is down, another Job Server in the group processes the job. You can distribute the execution of data flows or sub data flows within a batch job across multiple Job Servers within a Server Group to better balance resource-intensive operations.

SAP Data Services Estimation Guidelines

$
0
0

Assumptions of 100 jobs

1. The # of jobs with High complexity - 20

2. The # of jobs with Medium complexity – 30

3. The # of jobs with Low complexity – 50

 

1. Following are the Key Assumptions for approach - 1

    1. Legacy System owners / Legacy IT team are responsible for data profiling and data cleansing on the source legacy systems. Cleansed source / legacy data is provided as input to the data conversion team.
    2. For LSE / Non SAP legacy systems – Legacy IT team to cleanse, extract the required legacy data and send in ASCII or Desired code page file format ( multi-lingual support) as input data to  technical conversion team.
    3. BODS ETL Technical team to decide the best approach of data extraction, data transformation, data load during the design phase.

 

Sr. No

Tasks to be performed

Time required for  Low complexity

Time required for  Medium complexity (includes business logic)

Time required for  High complexity (includes complex business logic , several lookups etc)

1

Preparation of ETL Mapping Sheet

2 days/job

3 days/job

5 days/job

2

ETL Job Design and Development

3

ETL Job Testing with test case scenario

4

Migration one environment to other (import and export)

 

Time required for 100 jobs

2 * 50 = 100 Days

3 * 30 = 90 days

5 * 20 = 100 days

     

2. Following are the Key Assumptions for approach – 2 (Extracting Data directly from SAP and Non SAP Source and loading into SAP – BW)

    1. Connecting to Non-SAP and SAP source Directly
    2. Includes cleaning, profiling and error validations
    3. Routing data directly to SAP-BW
    4. BODS used to communicate with SAP and Non SAP Source and loading directly into SAP – BW Target
    5. Develop some standard job for validation and reuse for other objects as well

 

Sr. No

Tasks to be performed

Time required for  Low complexity

Time required for  Medium complexity (includes business logic)

Time required for  High complexity (includes complex business logic , several lookups etc)

1

Setting SAP Connectivity (R3 or IDOC)

3 days/job

4 days/job

6 days/job

2

Preparation of ETL Mapping Sheet

3

ETL Job Design and Development

4

ETL Job Testing with test case scenario

5

Migration one environment to other (import and export)

 

Time required for 100 jobs

3 * 50 = 150 Days

4 * 30 = 120 days

6 * 20 = 120 days

Understanding SAP BODS

$
0
0

Business Object Data Services (BODS) is a GUI tool which allows you to create and monitor jobs which take data from various types of sources and perform some complex transformation on the data as per the business requirement and then will load the data to a target which again can be of any type (i.e. SAP application, flat file, any database). The jobs created in BODS can be both real time and batch job.

Before getting into much details let’s first look at the architecture of BODS.


Capture.JPG


 

Repository

It is nothing but a set of tables which hold user created and system defined objects like metadata for source and target, transformation rules. We have three different types of repositories:

  • Local Repository
    It contains all the metadata about the system objects like workflow, dataflow, datastore etc. In a simple language it can be said as a folder and usually if there are multiple developers each developer is assigned a different local repository so that he can manage his tasks there without creating confusion for other developers. Jobs can be scheduled and executed from here. For maintaining different environments like Dev QA and Production we have local repository in all the environments.
  • Central Repository
    It is basically used for version control of the jobs, check-in check-out functions can be carried out from this repository. In real time scenarios we use it for the release management strategies. One can easily compare the jobs in local and central repository and hence get to know the changes made in local repository.
  • Profiler Repository
    It is used for data quality assessment, it stores the profiler tasks: these are the task which are created when a profiling request is submitted from designer or admin console, we can monitor the progress and execution of these tasks here. It is very helpful on the analysis side as we can easily get the insight of the data and the pattern of the distribution of the data.

 

Metadata Reporting
It contains all the metadata about a repository and can be used for reporting on the metadata available, like reporting on the objects created, child-parent hierarchy etc. There is a complete set of tables and views which can be accessed via SQL commands or choosing metadata reporting in admin console.

 

Designer

Creation of all the BODS object take place here, we can create workflow, dataflow, datastore and other objects. It is a graphical interface where the major task is done by dragging and dropping, here we structure our job and defines the transformation rules which are the major part in an ETL process. (We will look at designer in more detail in next documents)

 

Job Server

It contains engines and retrieves job information from the repository and execute the jobs on these engines, a repository can be linked to one or more job server depending on the number of jobs executing at a particular time and also the demand of better performance. It is like an integration unit which holds all the job information and extracts the data from the source systems and loads them to the target system.

 

Access Server

These are same as the job server but the difference is that they are used for the real time jobs; it controls the passing of message between source and target in real time using the XML messages.

 

Administration

It is usually referred to as admin console; here we schedule all our batch jobs and can monitor them, here we can get the trace log and error logs which can be helpful in analyzing the execution of the job.


This was all about the basic architecture of BODS system and in the coming documents we will look into the BODS objects like workflow and dataflow in more detail.

RANK in BODS using gen_row_num_by_group() function

$
0
0

Hello,

 

I am trying to explain Ranking in BODS. It plays very important role when your Business Requirement asks you to get most recent value within a group of values. We have a function called gen_row_num_by_group function() in BODS. With the help of this function you can generate RANK and then get Recent or Last value.

 

Go through the screens, it will tell you every step that you have to follow.


For Example:  Let’s take a scenario a customer has ordered few products on different dates. Now your business requirement wants to pick most recent date from Order date.

Now what you need to do is put order date in order by. That could be Ascending order.

 

RANK.png

 

Now Add a column you can name it as RANK and then map oreder_no with gen_row_num_by_group() function. It looks like below screen.

 

2.png

 

Now you run the job, you will get result like below table.

 

3.png

 

Now create another DF and make RANK TABLE as source. Map Oreder_date with max() function to pick most recent one. You would need group by Clause.

 

4.png

 

 

You have successfully generated RANK and picked up most recent date. Your result would look like below tables.

 

5.png

 

Regards,

Imran

Viewing all 401 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>