SSIS Basics: Merge join
For this small how-to, I’ll go through how to use the Merge Join in SSIS. The Merge Join is used for joining data into the a single row from two different datastreams.
A basic example, where the MergeJoin can be used, is where you have a flat file with emails, and you need to locate the contact in CRM, which is connected to that email, to fetch the CRM ContactId.
I have created an excel file with three emails, for this example case.
With the file data ready, we are good to start the work in SSIS. For this specific task, we need our Excel source and a CRM source, with the data we need to merge with.
For the Dynamics365 source, we are simply selecting Entity source type, and select Contact. To optimize performance, we only select the columns we are going to use. To do think, deselect everything, and just check off those columns you need.
Now we have our two data-sources, but if we try to add a Merge Join component now, we will get an error displayed. This is a sorting error, because Visual Studio can only merge sorted data.
The most reliable method to sort data in SSIS, is with a Sort component from the SSIS toolbox. With the sort component, we can select which columns we want to sort, and which columns simply needs to pass through without getting sorted. For the merge join to work, we need to sort both data streams with the same data type, and it must be the columns we are joining matching up. The sorted columns are marked as Pass Through as default.
When both data streams have been sorted, we can then add the Merge join. When adding the Merge Join, always attach the primary datastream first (easier to remember), as the second data stream will default be the joined stream. When both sources has been connected, the Merge automatically detects the sorted columns, and maps them as the Join link.
For the case where we need info from the second source based on the first source, we are choosing a Inner join. Then we simply mark the columns from both sources, that we want to output through the Merge join. In this case, we want the Email from the Excel file and the ContactId from the Dynamics365 source in a Inner join.
With the Join set up, we can add the wanted destination – in this example, I will use an Excel destination, and map the selected columns into the destination and run the package.
As the screenshot shows, we have three rows from our source, and we pass those three rows through the Merge Join with the data new enriched from the ContactBase in CRM.
And that is the basics behind the Merge Join in SSIS. This tool can obviously be used to perform much more complex tasks – and it is a great tool to use, if you are in an environment where you dont have access to a SQL server, to manage data transformation.