Wednesday, August 6, 2014

Migration of Data from a Legacy system to an Existing Salesforce System Using Talend

Migration of Data from a Legacy system to an Existing Salesforce System Using Talend

This document describes the steps involved in doing a data migration from a Legacy system to a salesforce system.
Step 1: Understanding the Legacy System
Step 2: Understanding the Existing Salesforce System
Step 3: Identifying the Staging area.
Step 4: Analyzing the Staging data.
Step 5: updating the Salesforce system Data Model
Step 6: One common Codes and value Table
Step 7: Working on Salesforce Triggers and validations
Step 8: Talend’s Role in Data Migration
Step 9: perform the imports


1.       Understanding the Legacy System:
Legacy is nothing but a very old application the Business is owning and very confident about its functionalities for years. The Legacy application runs under a Java platform and the back end of the system are in Oracle database.
The first major step is to identify the connectivity to the Legacy system. Once the connection and the data is available, we need to understand the functionalities of the Current application and its supporting applications.
Identify the entities in the legacy system and validate how many of those entities really carry the Business related information. Eliminating the unwanted entities during the initial analysis of the Source system will always avoid confusions in the later stage of the project.


                                                                         
2.      Understanding the Salesforce System :
Migrating data to Salesforce is always challenging, and if the salesforce system is in live , then the challenge is doubled.
Salesforce Backend follows two major policy, standard and custom . The entities which are used common across all salesforce application is called as standard entity and the entities which are designed particularly for the business is called as Custom entity.
Since we have an understanding of the Source system , we need to relate the Business of the source system to the Salesforce system . We need to remember one thing throughout our analysis , “ The change is only on the Application , not on the business “ . So we need to incorporate the source system logic in to the existing salesforce logic.
Collecting the Entities and its related Business information from an Salesforce perspective will provide information for analysis of data migration.


3.       Identifying the Staging Area :
Once the source and target system data is analyzed, next step is to identify the staging area. In the case of the relational database, then it is better to have the staging layer on top of the same database itself. This eliminates the data loss and the quality of the data remains the same in both source as well as the staging layer
If the Source system is a mainframe or kind of flat file system, then try to build a Staging layer on a relational data base. So, here comes the first job. Building the ETL for Source to Stage, Just like the normal ETL concepts, this staging layer will again be a replica of the Source system.
                                                                                                                                                                                     
4.       Analyzing the Staging data
In an Data migration perspective, it is not always necessary that all data in the source system is necessary for the staging layer nor in the salesforce system.
Once the required entities are available in the staging

Step 5: updating the Salesforce system Data Model
The Challenging portion of the Data Migration comes here in the entire project. The Salesforce data model which is in Live will be having a completely different Data Model and we need to fit the data from source in to the Existing Salesforce Data Model.
We need to identify the Salesforce Objects,that can be rightly mapped to the source table.
Once the Source and Target Entities are mapped , the next step is to do the field level mapping .
While doing the field level mapping , we could fine more number of fields available in salesforce Objects , for which no matching field is identified in Source system . Those can be ignored for initial analysis.
For the Fields, which we are able to map can be mapped . For the one , which we can’t do the mapping , we need to create new fields in salesforce Objects.

One common Codes and value Table
It’s common that the codes and values used in the source will have a different value in the target. It will be tough to use the decode statement everywhere.
It would be better, if there is a Lookup reference table created manually. Just like a table below. So , whenever there is a need for the salesforce.
source table
sour column
value
sfdc object
sfdc field
field value
src_tab_1
src_col_1
1
Tgt_obj_1
Tgt_col_1
One
src_tab_1
src_col_2
2
Tgt_obj_1
Tgt_col_2
two
src_tab_1
src_col_3
3
Tgt_obj_1
Tgt_col_3
three
src_tab_1
src_col_4
4
Tgt_obj_1
Tgt_col_4
four
src_tab_1
src_col_5
5
Tgt_obj_1
Tgt_col_5
five

Talend’s Role in Data Migration:
When migrating from a more complex source with multiple relational tables such as Access, MySQL, or MS SQL Server, using data loader can be time consuming and error prone. This is where Talend comes to the rescue.

·         Once a Talend job is set-up you can run it as many times as you like and each run takes a matter of minutes. This allows an iterative and flexible approach to data migration projects.
·         Talend comes with built-in connectors for pretty much any data source you can think of (including SQL Server.)
·         Talend natively supports the Salesforce.com Bulk Data API.
·         Talend is open source and completely free!




                                                                             

Working on Salesforce Triggers and validations:
Salesforce Application has got two major constraints with regard to data migration. Validation rules and triggers.
Triggers:
These are functions, which are built in salesforce application that will get started on certain action. In case, the Data migration is hitting any of the fields which are part of these triggers, then these records will be rejected during the load. So the key thing for migration is , SWITCH OFF the Triggers.
Validation Rules:
Validation Rules comes in to picture , only for a existing salesforce system . Incase of populating certain information , we may need to populate few extra more information which is not available in the source system to migrate. It’s the Data Analyst job to hardcode few areas , which looks valid and could be populated to salesforce. This way we will not violate the validation rules.
                                                                                
Perform the Imports :
The Final portion is to push the data to salesforce. The final data in the salesforce Replica schema will be the source and the target will be the salesforce system.
Incase of Existing system, we need to migrate certain existing information.So we need to go for Upserts in few loads. Upsert in salesforce can be done, only when we have a external id field in salesforce. It is more like a primary key in Traditional Database system.
There are cases , where we can find the Third party ETL Tools like Talend ,Informatica , etc.. not working as expectrd. In such cases , we can always depend on the tool Apex Data Loader ( Salesforce Native Tool) to do the same insert activity only if it is a simple insert.
                        






Regards,
Naveen
ETL Data Analyst