Hi All
Netezza to Teradata Migration
The scope of the document is to detail the design of steps involved to repoint the existing BODS jobs to Teradata, which are currently pointing to Netezza.
Why there is need to migrate from Netezza to Teradata?
Netezza is one of the key platforms that are serving as a reporting application for BGS. As a part of the BGEDW initiative, Netezza is the first data warehouse that is chosen for the migration to Teradata platform.
The other main reason for the migration of Netezza application to Teradata is the long lasting performance issues in Netezza that is making the reports to be available for Business very late in the day.
BODS job flows (actually used in NZ to TD migration)
In this method, we will be creating BODS jobs for transferring historical data from NZ to TD layer.
Steps of approach for historical loading:
1) Prepare metadata for source and target tables
The source and target table metadata needs to be defined. The target metadata is in Teradata layer and can be obtained by converting the datatypes and formats as converting Netezza table names to Teradata table names.
This can be achieved by the Netezza-to-Teradata DDL convertor tool which takes the source DDL (NZ) as input and gives the target DDL (TD). Define mappings between the source and target tables at column level
2) Build BODS jobs for data transfer
Teradata Parallel Transporter with fast load option in replace mode is used to load the Teradata tables with 1:1 mapping in BODS job flow built for this purpose. A single BODS job can be built and then with the help of macros we can replicate the job for any number of tables we want to migrate with 1:1 mapping based on the column position.
3) Perform migration
Populate tables in target by running the BODS jobs. We need to run the BODS job by passing the required parameters to populate the target tables. This whole process can be automated and run for desired number of times to migrate all the required tables for historical loading.
For Example:
How to migrate from Netezza to Teradata?
Source: BODS jobs extract data from various source systems and create flat files based upon them.
Arrival: The flat files created as part of the source process will be placed into a landing area referred to as the ‘arrival’ area.
Processing: Various BODS data flows load the data into the staging area with transformations.
Staging: Data is loaded in from the processing area and transformations are undertaken for history handling.
ODS/WH: Various BODS processes apply business rules against the data in the staging databases.
Data Marts: Various BODS processes apply business rules against the data in the staging databases.
To implement the above approach, we will divide the action items in 4 levels:
a) Source to Archive
b) Processing to Staging
c) Staging to Warehouse
d) Warehouse to Data Mart (Server Specific)
a) Source To Archive:
1) Import the job into the repository.
2) The existing dataflow DF_CONTROL will be replaced with a new work-flow named W_INSERT_DF_CONTROL.
3) Assign the values to the parameters defined for W_INSERT_DF_CONTROL workflow
4) The DF_CONTROL table will be re-imported into the ADMIN datastore.
5) Open the Main dataflow (e.g. D_SRC_ARV_UABSCON) and change the target table properties for the DF_CONTROL by setting the Auto Correct Load = YES under the Advanced Target Table properties.
6) One ‘source to archive’ work flow (W_SRC_ARV_UABSCON) is shown in the below figure.
b) Processing to Staging:
1) Import the job into the repository.
2) The existing dataflow DF_CONTROL will be replaced with a new work-flow named W_INSERT_DF_CONTROL.
3) Assign the values to the parameters defined for W_INSERT_DF_CONTROL workflow.
4) All the tables that will be used have to be re-imported from their respective datastores.
5) The W_RECOVERY_STG work-flow will be updated by replacing the double dot ‘..’ identifier after the database name with a single dot ‘.’ e.g. [$$SP_DB_EDW_ADMIN_DATABASE]..DF_CONTROL in the workflow W_RECOVERY_STG is changed to [$$SP_DB_EDW_ADMIN_DATABASE].DF_CONTROL.
6) If required, we will have to add the Post-Load and Pre-Load SQL commands obtained from the migration inventory. The migration inventory SQL commands will have to be converted to Teradata SQL first. Change the Bulk Loader Options for the target table as follows
i. Bulk Load = Parallel Transporter.
ii. File = Named Pipe and access module
iii. Operator = If data already exists - Update else Load.
iv. Number of Instances = 1.
v. Mode = Append.
vi. Bulk Operation = Insert.
vii. Field Delimiter = 1/27
Named Pipe Parameters
The default value for all the parameters is C:\Program Files\Business Objects\BusinessObjects Data Services\Log\BulkLoader. Change it as below:
viii. Logdirectory = [$$SP_LOG_DIRECTORY]
ix. FallbackDirectory = [$$SP_LOG_DIRECTORY]
7) Change the datatype of the parameter [$P_DATETIME_ACTIVITYDATE] =VARCHAR (19) from DATETIME. Enclose in single quotes, when it is used in Pre/Post-Load commands.
One processing to staging work flow is shown in the below figure.
c) Staging to Warehouse:
1) Import the job into the repository.
2) The existing dataflow DF_CONTROL will be replaced with a new work-flow named W_INSERT_DF_CONTROL.
3) Assign the values to the parameters defined for W_INSERT_DF_CONTROL workflow $P_W_VARCHAR_JOBNAME = $L_VARCHAR_JOBNAME
4) All the tables that will be used have to be re-imported from their respective datastores.
5) The W_RECOVERY_STG work-flow will be updated by replacing the double dot ‘..’ identifier after the database name with a single dot ‘.’. e.g. [$$SP_DB_EDW_ADMIN_DATABASE]..DF_CONTROL in the workflow W_RECOVERY_STG is changed to [$$SP_DB_EDW_ADMIN_DATABASE].DF_CONTROL.
6) Change the datatype of the parameter [$P_DATETIME_ACTIVITYDATE] =VARCHAR (19) from DATETIME.
7) If required, we will have to add the Post-Load and Pre-Load SQL commands obtained from the migration inventory.
8) While the load is happening from Teradata to Teradata tables (BODS extracts data, performs transformations, then loads), change the Bulk Loader Options as below:
9) If any custom SQL is used for the lookups, it has to be modified to make it Teradata compatible. Refer section 10.2 for SQL conversion from Netezza to Teradata.
10) Open the Target Table Editor and go to Options tab and change the Column Comparison = Compare by position in the General settings.
One Staging to Warehouse Work Flow is shown in the below figure.
d) Warehouse to Server Specific (Data Mart)
1) Import the job into the repository.
2) The existing dataflow DF_CONTROL will be replaced with a new work-flow named W_INSERT_DF_CONTROL.
3) Assign the values to the parameters defined for W_INSERT_DF_CONTROL workflow
4) All the tables that will be used have to be re-imported from their respective datastores.
5) The W_RECOVERY_SS work-flow will be updated by replacing the double dot ‘..’ identifier after the database name with a single dot ‘.’. e.g. [$$SP_DB_EDW_ADMIN_DATABASE]..DF_CONTROL in the workflow W_RECOVERY_SS is changed to [$$SP_DB_EDW_ADMIN_DATABASE].DF_CONTROL.
6) Change the datatype of the parameter [$P_DATETIME_ACTIVITYDATE] = VARCHAR (19) from DATETIME
7) Add the Post-Load and Pre-Load SQL commands
One Warehouse to Service Specific work flow (W_WH_SS_PROD_MVMT_HMOVE) is shown in the figure below.
Maestro/ Tivoli Workload Scheduler(TWS)- Batch
Introduction
Tivoli Workload Scheduler (TWS) provides the backbone for automated workload management and monitoring. Offering a single console, real-time alerts, reports and self-healing capabilities, this is software automation to help manage the workloads that span your enterprise.
Scheduler Jobs Process
In order to offer greater flexibility, improved throughput and concurrency, the existing schedules shall be updated to execute BODS processes at the atomic level. In essence, this means that each dataflow in BODS shall have a corresponding job in TWS.
Current Structure:
The current strucuture with BODS Jobs structure hierarchy with Netezza is shown below
So, we can have ‘n’ number of workflows in one job and similarly ‘n’ number of dataflows in one workflow, thereby the relation being One-to-Many for Job:Workflow and One-to-Many for Workflow:Dataflow.
In this case the dependency of one workflow would be on the successful completion of all the dataflows created under it and similarly a whole job would be successful with the completion of all the workflows below that job in the hierarchy.
Proposed structure of BODS Jobs with Teradata
So, we can have only one workflow in one job and ‘n’ number of dataflows in one workflow, thereby the relation being One-to-One for Job:Workflow and One-to-Many for Workflow:Dataflow.
In this case, the dependency of one workflow would be on the successful completion of all the dataflows created under it and similarly a whole job would be successful with the completion of just one workflow created under that job in the hierarchy.
By following the above process, we can migrate from Netezza to Teradata using Business Objects Data Services.
Thanks & Regards,
Vishakha Nigam