I mentioned in a previous article that many posts on T-SQL Merge read like Books Online. Speaking of Books Online, let’s start by glancing at the syntax portion of the T-SQL Merge Page. I’ll take the liberty of re-posting just the first 25% or so below.
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ]
[ WITH ( ) ] [ [ AS ] table_alias ]
[ WHEN MATCHED [ AND
] [ …n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND
[ WHEN NOT MATCHED BY SOURCE [ AND
] [ …n ]
[ OPTION (
[ ,…n ] ) ]
[ database_name . schema_name . | schema_name . ]
[ ,…n ] ]
[ [ , ] INDEX ( index_val [ ,…n ] ) ] }
Simple right? Great, I guess I’m done here…. No seriously, who can readily absorb that? So, what is Merge really and how do we use it?
T-SQL Merge Basics?
In a nutshell, the Merge statement allows you to Insert, Update, or Delete data in a Table, referred to as the Target, with data from another Table called the Source. The tables are compared on Fields which uniquely identify records in each, a Join if you will. That’s not to say that the Source has to be an actual Table, a view or a Select statement may be used as a source as well.
I think most people learn best from examples, by doing rather than reading, so I’ve provided a short script to create the tables required for the following example.
Slowly Changing Dimension Table Setup Scripts Jedi 1.4
Now, we’re using a CustomerStaging table for sake of simplicity. However, I could have used a T-SQL query from an OLTP database or other source as the Source object in my merge. For this article, let’s assume there are no Slowly Changing Dimensions and ignore the fields in each table to the right of Affiliation.
There are several differences between Staging and DimCustomer which should be noted:
- Anakin Skywalker is no longer in the source system.
- Yoda’s name was changed.
- Obi-Wan Kenobi changed planets.
- Darth Vader was added.
Now I realize that Anakin became Darth Vader, which could have been seen as a change in name. However, Obi Wan clearly states that Darth Vader betrayed and murdered Anakin, effectively becoming a new person. If that bothers you, then you’re a scruffy looking nerf herder. Anyway, if we used separate T-SQL statements, we’d delete Anakin, Update Yoda and Obi-Wan, and Add Darth Vader. With Merge, we can do that in one transaction. For me, the easiest way to learn something is to go through an example. We’ll break the statements into sections to make it easy to understand.
First, the Target and Source tables are specified along with the business key which identifies each record. This is the field that one would use in a join.
MERGE DimCustomer AS Target USING CustomerStaging AS Source ON Target.CustomerKey = Source.CustomerKey
The When Matched clause determines what will happen when records exist in the Source and Destination with the same CustomerKey. Notice the additional conditions I’ve added. Strictly speaking, they aren’t required, but every record in the target would be updated regardless of need, which wastes resources.
WHEN MATCHED AND ( Target.CustomerName <> Source.CustomerName -- These are optional. OR Target.Planet <> Source.Planet OR Target.Affiliation <> Source.Affiliation) THEN UPDATE SET --Updates Yoda and Obi-Wan records. Target.CustomerName = Source.CustomerName, Target.Planet = Source.Planet, Target.Affiliation = Source.Affiliation
The When Not Matched by Target clause specifies what should be done with records in the Source that aren’t in the destination. The typical scenario is to insert records which are new. I could have added additional conditions or only added certain new records as well.
WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerKey, -- Inserts Darth Vader CustomerName, Planet, Affiliation) VALUES (CustomerKey, Source.CustomerName, Source.Planet, Source.Affiliation)
The When Not Matched by Source clause specifies what should be done with records in the Target that aren’t in the Source. Keep in mind that if your Staging table isn’t comprehensive, maybe the result of an incremental extraction, then you’d want to omit this portion of the statement.
WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes Anakin
Also, keep in mind that any Merge statement must be terminated in a semicolon. So, when you put your script together, it looks like the following. Go ahead and run it on your test data.
MERGE DimCustomer AS Target USING CustomerStaging AS Source ON Target.CustomerKey = Source.CustomerKey WHEN MATCHED AND ( Target.CustomerName <> Source.CustomerName OR Target.Planet <> Source.Planet OR Target.Affiliation <> Source.Affiliation) THEN UPDATE SET Target.CustomerName = Source.CustomerName, Target.Planet = Source.Planet, Target.Affiliation = Source.Affiliation WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerKey, CustomerName, Planet, Affiliation) VALUES (CustomerKey, Source.CustomerName, Source.Planet, Source.Affiliation) WHEN NOT MATCHED BY SOURCE THEN DELETE;
What about Null Values?
Ah, very astute of you to notice that my young padawan. You’ll notice in the When Matched portion that I also check to see if a value changed before I run my update statement. However, depending on your settings, Null values are not equal to each other. Therefore, any record with a Null value would be updated even if there were no changes. I handle this with Coalesce because it chooses the best datatype between arguments based on datatype precedence. Therefore, I can use the same basic Coalesce statement for most comparisons without throwing an error. Therefore, the When Matched portion from above becomes the following:
WHEN MATCHED AND (COALESCE(TARGET.CUSTOMERNAME, '') <> COALESCE(SOURCE.CUSTOMERNAME, '') OR COALESCE(TARGET.PLANET, '') <> COALESCE(SOURCE.PLANET, '') OR COALESCE(TARGET.AFFILIATION, '') <> COALESCE(SOURCE.AFFILIATION, ''))
Of course, there are always limitations. The most important limitation is that both data sources need to be on the same SQL Server instance. I suppose you could use linked servers, but I wouldn’t recommend it.
Another factor that might give you some pause is that these statements are fairly complex and wordy. A long merge statement, such as merging two tables with 25 fields each, is tedious to write. Well, stay tuned because later in this series I’ll share some code which will practically write the statements for you.
In the next article we’ll discuss how to use the Output clause with Merge, and how I use it occasionally to perform auditing during data warehouse loads.