Migrating your Data Warehouse from Cognos Data Manager to InfoSphere DataStage
Discover why you should migrate your data warehouse from IBM Cognos Data Manager to another IBM tool, InfoSphere DataStage.
If you are running a data warehouse with either Cognos Decision Stream, Cognos Data Manager or have one of the Cognos Performance Applications (PApps) you will soon need to migrate to InfoSphere DataStage or recreate your ETL in another IBM tool, as these products are no longer sold or supported by IBM.
Benefits of InfoSphere DataStage
- Mature, enterprise class ETL environment with many more features and integrations than competitors
- Large, flexible toolset with 80+ different stages
- Only ETL tool that converts your existing code automatically with a migration tool and capitalises on your existing investment
- If you are still in support you will already have licenses for DataStage Workgroup Edition
- Significant upgrade from Data Manager through familiar graphical user interface
- Migrating to this is typically a much faster solution to redeveloping your warehouse from scratch in another ETL tool
- High performance and scalable
Consider the Following for a Successful InfoSphere DataStage Migration
Before starting to do the migration work, you need to review your technical debt that has built up in your code. For example, simplify – remove the plasters and do it properly, prune code – no point migrating code no longer using correct data types – Data Manager was very bad / forgiving at selecting appropriate data types, most modern data ETL tools including DataStage and SSIS are not.
Before starting the migration, look at your use of Data Manager and identify:
- Parameters – make a list of all parameter names and example values – you will need to enter these in the migration tool.
- Custom Function – simple ones can be handled in the migration tool; complex ones will need to be re-created in Infosphere DataStage.
- Auto Hierarchies – currently unsupported – you will need to convert dimension to a fixed level hierarchy.
- Connections – take a note of all source and target connections. These will need to be manually created using parameter sets in InfoSphere DataStage.
- Naming – most things in Data Stage cannot start with a number or include illegal characters so for example a connection called 001.Staging will need to be renamed C001_Staging. A column name called #CLAIMS or CLAIMS# would need to be renamed CLAIMS or CLAIMS_NO etc..
Export & Migrate
This step takes the builds from Data Manager, runs them in test mode (no data delivery) and creates an XML file for each component found. The main steps are: Remove or blank anything we know will not migrate from one or a few builds (to be added back in InfoSphere DataStage) Create package of one or a few builds Set parameter values in migration tool scripts Run migration tool (if you missed anything errors will be displayed) On successful completion, you will have several xml files created representing your builds.
Combine & Import
This step creates a Data Stage import file run the script to combine the xml files Open DataStage and import into the new project.
Finish & Test
The main part of the work will be to finish off the builds as required and test that they work. This could include: Create Environment Variables as required – for example surrogate key file and reject file location Create Parameter Sets as required.
For example, connection details for development, test and production or extract from and to dates Create Surrogate Key files for dimensions Add back / complete anything removed prior to migration as required Test run migrated builds Make changes as required. Mostly these are for datatypes if this was not corrected in the technical Debt stage.
Schedule & Monitor
Once tested successfully you can make use of InfoSphere DataStage’s production environment including built in:
- Scheduler – view which jobs are running and monitor CPU, memory & disk usage.
- Administer – control project access, abilities, throttle CPU & memory usage.
- Improve – take advantage of the new stages and abilities and take your data warehouse to the next level.
- Simplify – reduce or remove the number of staging tables and combine into a single DataStage job for each final delivered table.
- CDC – implement a form of change data capture (CDC) to only pull in the data that’s changed since the previous day.
- Expand – pull in more data from additional data sources such as JDE, Salesforce and cloud databases.
Sign up and never miss a blog post from Triangle