In a previous article, I discussed Merge statement basics. However, in extensive testing I’ve come to realize that my article, like most articles I’ve read about Merge leaves out or mis-handles several important aspects. Rather than edit that article, I’ve decided to publish a series of articles which I hope will clear up some of these misconceptions. If you already read the original article, a good portion of this will be review as I’m using the original as a base for this one.
Let’s start by glancing at the syntax portion of the Books Online T-SQL Merge Page. I’ll take the liberty of re-posting just the first 25% or so below.
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ]
[ WITH ( ) ] [ [ AS ] table_alias ]
[ WHEN MATCHED [ AND
] [ …n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND
[ WHEN NOT MATCHED BY SOURCE [ AND
] [ …n ]
[ OPTION (
[ ,…n ] ) ]
[ database_name . schema_name . | schema_name . ]
[ ,…n ] ]
[ [ , ] INDEX ( index_val [ ,…n ] ) ] }
Simple right? Great, I guess I’m done here…. No seriously, who can easily absorb that? So, what is Merge really and how do we use it?
T-SQL Merge Basics
In a nutshell, the Merge statement allows you to Insert, Update, or Delete data in an entity, referred to as the Target, with data from another entity called the Source. The entities are compared on Fields which uniquely identify records in each, a Join if you will. Notice how I keep using the word entity rather than table, and the reason is that the Target and Source could be many SQL Server objects such as Tables, Temp Tables, Views, Table Variables, or even Common Table Expressions. The Source could also be a complete Select statement as well. In this case, for the sake of simplicity, I’ll use Temp Tables.
I think most people learn best from examples, by doing rather than reading descriptions of syntax, so I’ve provided a brief script to create the tables required for the following example.
USE TempDB; GO IF OBJECT_ID ('tempdb..#Customer_Orig') IS NOT NULL DROP TABLE #Customer_Orig; IF OBJECT_ID ('tempdb..#Customer_New') IS NOT NULL DROP TABLE #Customer_New; CREATE TABLE #Customer_Orig ( CustomerNum TINYINT NOT NULL ,CustomerName VARCHAR (25) NULL ,Planet VARCHAR (25) NULL); CREATE TABLE #Customer_New ( CustomerNum TINYINT NOT NULL ,CustomerName VARCHAR (25) NULL ,Planet VARCHAR (25) NULL); INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet) VALUES (1, 'Anakin Skywalker', 'Tatooine') ,(2, 'Yoda', 'Coruscant') ,(3, 'Obi-Wan Kenobi', 'Coruscant'); INSERT INTO #Customer_New (CustomerNum, CustomerName, Planet) VALUES (1, 'Anakin Skywalker', 'Tatooine') ,(2, 'Yoda', 'Coruscant') ,(3, 'Obi-Wan Kenobi', 'Coruscant');
So, I’ve created two temporary tables called Customer_New and Customer_Orig with identical data in each. In this case, Customer_New will be the Target and Customer_Orig will be the Source.
Now, we’re going to make the following changes to Customer_Orig. Notice that I’ve performed a single Insert, Update, and Delete.
-- Update Yoda's Name UPDATE #Customer_Orig SET CustomerName = 'Master Yoda' WHERE CustomerNum = 2 -- Delete Anakin DELETE #Customer_Orig WHERE CustomerNum = 1 --Add Darth INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet) VALUES (4, 'Darth Vader', 'Death Star')
Now, being the geek that I am, I realize that Anakin became Darth Vader, which could have been seen as a change in name. However, Obi-Wan clearly states that Darth Vader betrayed and murdered Anakin, effectively becoming a new person. If that bothers you, then you’re a scruffy looking nerf herder.
Old School CRUD
Prior to SQL Server 2008, we could have accomplished this merge with the following code. Please note that I have used Joins in my example, which will align with my Merge code later.
--Process Updates Update Tgt Set Tgt.CustomerName = Src.CustomerName, Tgt.Planet = Src.Planet FROM #Customer_New Tgt Inner JOIN #Customer_Orig Src ON Tgt.CustomerNum = Src.CustomerNum Where Tgt.CustomerName <> Src.CustomerName Or Tgt.Planet <> Src.Planet -- Eliminates needless updates. --Process Inserts Insert Into #Customer_New SELECT Src.CustomerNum, Src.CustomerName, Src.Planet FROM #Customer_Orig Src LEFT JOIN #Customer_New Tgt ON Tgt.CustomerNum = Src.CustomerNum Where Tgt.CustomerNum is null; --Process Deletes Delete FROM Tgt from #Customer_New as Tgt LEFT JOIN #Customer_Orig Src ON Tgt.CustomerNum = Src.CustomerNum Where Src.CustomerNum is null;
This works, but it’s less than optimal for a few reasons. First, writing those statements can be tedious, especially if this been a typical table with 20+ fields to deal with. Second, this represents three separate groups of work and SQL Server processes them that way. Once you understand it, the T-SQL Merge Statement is easier to write and can accomplish this in one transaction. It’s basically a win-win.
Components of Merge Statements
So, lets break a Merge statement into it’s component parts to make it easy to understand. First, the Target and Source tables are specified along with the business key which identifies each record. This is the field that one would use in a join.
MERGE #Customer_New AS Target USING #Customer_Orig AS Source ON Target.CustomerNum = Source.CustomerNum
The When Matched clause determines what will happen when records exist in the Source and Target with the same CustomerNum. Notice the additional conditions I’ve added which limits the updates only to records where a value has changed. Strictly speaking, this isn’t required, but every record in the target would be updated regardless of need, which wastes resources.
WHEN MATCHED AND (Target.CustomerName <> Source.CustomerName OR Target.Planet <> Source.Planet) Then UPDATE SET --Updates Yoda's Name Target.CustomerName = Source.CustomerName, Target.Planet = Source.Planet
The When Not Matched by Target clause specifies what should be done with records in the Source that aren’t in the Target. The typical scenario is to insert records which are new. I could have added additional conditions or only added certain new records as well.
WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerNum, CustomerName, Planet) -- Inserts Darth VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet)
The When Not Matched by Source clause specifies what should be done with records in the Target that aren’t in the Source. Keep in mind that if this was a Staging table which wasn’t comprehensive, perhaps the result of an incremental extraction, then you’d want to omit this portion of the statement.
WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes Anakin
Also, keep in mind that any Merge statement must be terminated in a semicolon. So, when you put your script together, it looks like the following. Go ahead and run it on your test data.
MERGE #Customer_New AS Target USING #Customer_Orig AS Source ON Target.CustomerNum = Source.CustomerNum WHEN MATCHED AND (Target.CustomerName <> Source.CustomerName OR Target.Planet <> Source.Planet) THEN UPDATE SET Target.CustomerName = Source.CustomerName ,Target.Planet = Source.Planet WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerNum, CustomerName, Planet) VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet) WHEN NOT MATCHED BY SOURCE THEN DELETE;
What about Null Values?
Ah, very astute of you to notice that my young padawan. This is the first mistake that many people make. You’ll notice in the When Matched portion above, that I also check to see if a value changed before I run my update statement. However, depending on your settings, Null values are not equal to each other. Therefore, any record with a Null value will not be updated if one of them is NULL. A great explanation of NULL handling can be found here. First, let’s set up a NULL field comparison issue. Run the following update statement to the Customer_Orig table.
Update CO set Planet = NULL FROM #Customer_Orig CO WHERE CustomerNum = 3 -- Obi-Wan Kenobi
If you’ll re-run the original merge statement, and compare the tables, you’ll find the following:
Notice that Obi-Wan’s record in Customer_New has not been updated. There are a few ways to deal with this which I’ve seen online. The first two methods each have their problems, which I’ll briefly explain. The third method is the one I recommend if you’d like to “skip ahead” using Select and Except.
Adding “Is NULL” Statements
We could re-write the When Matched clause like this:
WHEN MATCHED AND ( Target.CustomerName <> Source.CustomerName OR Target.CustomerName IS NULL OR Source.CustomerName IS NULL OR Target.Planet <> Source.Planet OR Target.Planet IS NULL OR Source.Planet IS NULL)
Technically this method works in that it will update if one of your values is NULL. However, its really cumbersome to write, imagine writing a statement like that for 50 columns. Also, the resulting query plan is complex. Perhaps the biggest problem is that if both of your values are NULL, then an update will happen anyway. Go ahead and substitute the code above and re-run the merge statement where one of the values is NULL. You should receive the message “1 Row Effected.” However, if you repeatedly run the merge statement you will ALWAYS receive “1 Row Effected,” again depending on your database settings. For these reasons, this method isn’t recommended.
Using ISNULL or Coalesce
Coalesce can also be used because it chooses the best datatype between arguments based on datatype precedence. Therefore, we could use the same basic Coalesce statement for most comparisons without throwing an error and the When Matched portion from above becomes the following:
WHEN MATCHED AND (COALESCE(TARGET.CustomerName, '') <> COALESCE(SOURCE.CustomerName, '') OR COALESCE(TARGET.Planet, '') <> COALESCE(SOURCE.Planet, ''))
This works as well, with some caveats. First, the second value in each Coalesce statement has to be something that should NEVER occur in that column, therefore an empty string ” is most likely not a good choice. Second, notice how I emphasized that coalesce would automatically cast most comparisons, well there are some data types that will fail and require you to use something other than a default string value. The nice thing about this method is that the T-SQL code is short and easy to understand.
Using Select and Except to Handle Nulls(Recommended)
I used the Coalesce method successfully for some time until I came across this excellent article by Paul White. In it he correctly points out that when using Except or Intersect Null values are considered equal. Therefore, when comparing two fields with NULL values, they will be equivalent, and when comparing a NULL value to some other actual value they will be evaluated as Not Equal. He seems to favor using “Not Exists” and Intersect. However, using Exists and Except just makes more sense to my brain. Therefore, I’d write the query like the following. If you take this code and run it on the original tables, as well as the NULL test,you’ll see that it works perfectly.
MERGE #Customer_New AS Target USING #Customer_Orig AS Source ON Target.CustomerNum = Source.CustomerNum WHEN MATCHED AND EXISTS (SELECT Source.CustomerName, Source.Planet EXCEPT SELECT Target.CustomerName, Target.Planet) THEN UPDATE SET Target.CustomerName = Source.CustomerName ,Target.Planet = Source.Planet WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerNum, CustomerName, Planet) VALUES (CustomerNum, Source.CustomerName, Source.Planet) WHEN NOT MATCHED BY SOURCE THEN DELETE;
Limitations of Merge?
Of course, there are always limitations. The most important limitation is that both data sources should be on the same SQL Server instance. I’ve seen people use Merge with linked servers, but I wouldn’t recommend it.
Another factor that might give you some pause is that these statements are fairly complex and wordy. A long merge statement, such as merging two tables with 25 fields each, is tedious to write and it’s very easy to make a simple mistake. Well, stay tuned because later in this series I’ll share some code which will practically write the statements for you.
In the next article in this series we’ll discuss how to use the Merge statement with the Output clause as it’s required to load Slowly Changing Dimensions.