Archives

Writing T-SQL Merge Statements the Right Way

In a previous article, I discussed Merge statement basics. However, in extensive testing I’ve come to realize that my article, like most articles I’ve read about Merge leaves out or mis-handles several important aspects. Rather than edit that article, I’ve decided to publish a series of articles which I hope will clear up some of these misconceptions. If you already read the original article, a good portion of this will be review as I’m using the original as a base for this one.

Let’s start by glancing at the syntax portion of the Books Online T-SQL Merge Page. I’ll take the liberty of re-posting just the first 25% or so below.

[ WITH [,…n] ]
MERGE
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] [ WITH ( ) ] [ [ AS ] table_alias ]
USING ON
[ WHEN MATCHED [ AND ]
THEN ] [ …n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND ]
THEN ]
[ WHEN NOT MATCHED BY SOURCE [ AND ]
THEN ] [ …n ]
[ ]
[ OPTION ( [ ,…n ] ) ]
;

::=
{
[ database_name . schema_name . | schema_name . ]
target_table
}

::=
{
{ [ [ ,…n ] ]
[ [ , ] INDEX ( index_val [ ,…n ] ) ] }
}

Simple right? Great, I guess I’m done here…. No seriously, who can easily absorb that? So, what is Merge really and how do we use it?

T-SQL Merge Basics


In a nutshell, the Merge statement allows you to Insert, Update, or Delete data in an entity, referred to as the Target, with data from another entity called the Source. The entities are compared on Fields which uniquely identify records in each, a Join if you will. Notice how I keep using the word entity rather than table, and the reason is that the Target and Source could be many SQL Server objects such as Tables, Temp Tables, Views, Table Variables, or even Common Table Expressions. The Source could also be a complete Select statement as well. In this case, for the sake of simplicity, I’ll use Temp Tables.

I think most people learn best from examples, by doing rather than reading descriptions of syntax, so I’ve provided a brief script to create the tables required for the following example.

USE TempDB;
GO
 
IF OBJECT_ID ('tempdb..#Customer_Orig') IS NOT NULL DROP TABLE #Customer_Orig;
IF OBJECT_ID ('tempdb..#Customer_New')  IS NOT NULL DROP TABLE #Customer_New;
 
CREATE TABLE #Customer_Orig
(  CustomerNum    TINYINT NOT NULL
  ,CustomerName   VARCHAR (25) NULL
  ,Planet         VARCHAR (25) NULL);
 
CREATE TABLE #Customer_New
(  CustomerNum    TINYINT NOT NULL
  ,CustomerName   VARCHAR (25) NULL
  ,Planet         VARCHAR (25) NULL);
 
INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet)
   VALUES (1, 'Anakin Skywalker', 'Tatooine')
         ,(2, 'Yoda', 'Coruscant')
         ,(3, 'Obi-Wan Kenobi', 'Coruscant');
INSERT INTO #Customer_New (CustomerNum, CustomerName, Planet)
   VALUES (1, 'Anakin Skywalker', 'Tatooine')
         ,(2, 'Yoda', 'Coruscant')
         ,(3, 'Obi-Wan Kenobi', 'Coruscant');

So, I’ve created two temporary tables called Customer_New and Customer_Orig with identical data in each. In this case, Customer_New will be the Target and Customer_Orig will be the Source.
Merge Base
Now, we’re going to make the following changes to Customer_Orig. Notice that I’ve performed a single Insert, Update, and Delete.

-- Update Yoda's Name
UPDATE #Customer_Orig
   SET CustomerName = 'Master Yoda'
 WHERE CustomerNum = 2
-- Delete Anakin
DELETE #Customer_Orig
 WHERE CustomerNum = 1
--Add Darth
INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet)
VALUES (4, 'Darth Vader', 'Death Star')

Merge Changes To Base

Now, being the geek that I am, I realize that Anakin became Darth Vader, which could have been seen as a change in name. However, Obi-Wan clearly states that Darth Vader betrayed and murdered Anakin, effectively becoming a new person. If that bothers you, then you’re a scruffy looking nerf herder.

Old School CRUD

Prior to SQL Server 2008, we could have accomplished this merge with the following code. Please note that I have used Joins in my example, which will align with my Merge code later.

--Process Updates  
Update Tgt
Set    Tgt.CustomerName = Src.CustomerName, Tgt.Planet = Src.Planet
FROM   #Customer_New Tgt Inner JOIN #Customer_Orig Src ON Tgt.CustomerNum = Src.CustomerNum
Where  Tgt.CustomerName <> Src.CustomerName Or Tgt.Planet <> Src.Planet -- Eliminates needless updates.
--Process Inserts
Insert Into #Customer_New
  SELECT Src.CustomerNum, Src.CustomerName, Src.Planet
  FROM   #Customer_Orig Src LEFT JOIN #Customer_New Tgt ON Tgt.CustomerNum = Src.CustomerNum
  Where  Tgt.CustomerNum is null;
--Process Deletes
Delete FROM Tgt
from        #Customer_New as Tgt LEFT JOIN #Customer_Orig Src ON Tgt.CustomerNum = Src.CustomerNum
Where       Src.CustomerNum is null;

This works, but it’s less than optimal for a few reasons. First, writing those statements can be tedious, especially if this been a typical table with 20+ fields to deal with. Second, this represents three separate groups of work and SQL Server processes them that way. Once you understand it, the T-SQL Merge Statement is easier to write and can accomplish this in one transaction. It’s basically a win-win.

Components of Merge Statements

So, lets break a Merge statement into it’s component parts to make it easy to understand. First, the Target and Source tables are specified along with the business key which identifies each record. This is the field that one would use in a join.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum

The When Matched clause determines what will happen when records exist in the Source and Target with the same CustomerNum. Notice the additional conditions I’ve added which limits the updates only to records where a value has changed. Strictly speaking, this isn’t required, but every record in the target would be updated regardless of need, which wastes resources.

WHEN MATCHED  
                AND (Target.CustomerName <> Source.CustomerName 
                  OR Target.Planet <> Source.Planet)
Then 
  UPDATE SET --Updates Yoda's Name
      Target.CustomerName = Source.CustomerName,
      Target.Planet = Source.Planet

The When Not Matched by Target clause specifies what should be done with records in the Source that aren’t in the Target. The typical scenario is to insert records which are new. I could have added additional conditions or only added certain new records as well.

WHEN NOT MATCHED BY TARGET
THEN
   INSERT (CustomerNum, CustomerName, Planet) -- Inserts Darth
   VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet)

The When Not Matched by Source clause specifies what should be done with records in the Target that aren’t in the Source. Keep in mind that if this was a Staging table which wasn’t comprehensive, perhaps the result of an incremental extraction, then you’d want to omit this portion of the statement.

WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes Anakin

Also, keep in mind that any Merge statement must be terminated in a semicolon. So, when you put your script together, it looks like the following. Go ahead and run it on your test data.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum
WHEN MATCHED
                AND (Target.CustomerName <> Source.CustomerName
                     OR Target.Planet <> Source.Planet)
THEN
   UPDATE SET
      Target.CustomerName = Source.CustomerName
     ,Target.Planet = Source.Planet
WHEN NOT MATCHED BY TARGET
THEN
   INSERT (CustomerNum, CustomerName, Planet)
   VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Result of Merge

What about Null Values?


Ah, very astute of you to notice that my young padawan. This is the first mistake that many people make. You’ll notice in the When Matched portion above, that I also check to see if a value changed before I run my update statement. However, depending on your settings, Null values are not equal to each other. Therefore, any record with a Null value will not be updated if one of them is NULL. A great explanation of NULL handling can be found here. First, let’s set up a NULL field comparison issue. Run the following update statement to the Customer_Orig table.

Update CO set Planet = NULL
  FROM #Customer_Orig CO
 WHERE CustomerNum = 3 -- Obi-Wan Kenobi

If you’ll re-run the original merge statement, and compare the tables, you’ll find the following:
Result of Merge Non NULL Handling
Notice that Obi-Wan’s record in Customer_New has not been updated. There are a few ways to deal with this which I’ve seen online. The first two methods each have their problems, which I’ll briefly explain. The third method is the one I recommend if you’d like to “skip ahead” using Select and Except.

Adding “Is NULL” Statements
We could re-write the When Matched clause like this:

WHEN MATCHED
                AND (   Target.CustomerName <> Source.CustomerName
                     OR Target.CustomerName IS NULL
                     OR Source.CustomerName IS NULL
                     OR Target.Planet <> Source.Planet
                     OR Target.Planet IS NULL
                     OR Source.Planet IS NULL)

Technically this method works in that it will update if one of your values is NULL. However, its really cumbersome to write, imagine writing a statement like that for 50 columns. Also, the resulting query plan is complex. Perhaps the biggest problem is that if both of your values are NULL, then an update will happen anyway. Go ahead and substitute the code above and re-run the merge statement where one of the values is NULL. You should receive the message “1 Row Effected.” However, if you repeatedly run the merge statement you will ALWAYS receive “1 Row Effected,” again depending on your database settings. For these reasons, this method isn’t recommended.

Using ISNULL or Coalesce
Coalesce can also be used because it chooses the best datatype between arguments based on datatype precedence. Therefore, we could use the same basic Coalesce statement for most comparisons without throwing an error and the When Matched portion from above becomes the following:

WHEN MATCHED
                AND (COALESCE(TARGET.CustomerName, '') <> COALESCE(SOURCE.CustomerName, '') 
                  OR COALESCE(TARGET.Planet, '') <> COALESCE(SOURCE.Planet, ''))

This works as well, with some caveats. First, the second value in each Coalesce statement has to be something that should NEVER occur in that column, therefore an empty string ” is most likely not a good choice. Second, notice how I emphasized that coalesce would automatically cast most comparisons, well there are some data types that will fail and require you to use something other than a default string value. The nice thing about this method is that the T-SQL code is short and easy to understand.

Using Select and Except to Handle Nulls(Recommended)


I used the Coalesce method successfully for some time until I came across this excellent article by Paul White. In it he correctly points out that when using Except or Intersect Null values are considered equal. Therefore, when comparing two fields with NULL values, they will be equivalent, and when comparing a NULL value to some other actual value they will be evaluated as Not Equal. He seems to favor using “Not Exists” and Intersect. However, using Exists and Except just makes more sense to my brain. Therefore, I’d write the query like the following. If you take this code and run it on the original tables, as well as the NULL test,you’ll see that it works perfectly.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum
WHEN MATCHED AND EXISTS
                    (SELECT Source.CustomerName, Source.Planet
                     EXCEPT
                     SELECT Target.CustomerName, Target.Planet)
THEN
   UPDATE SET
      Target.CustomerName = Source.CustomerName
     ,Target.Planet = Source.Planet
WHEN NOT MATCHED BY TARGET
THEN
   INSERT (CustomerNum, CustomerName, Planet)
   VALUES (CustomerNum, Source.CustomerName, Source.Planet)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Limitations of Merge?


Of course, there are always limitations. The most important limitation is that both data sources should be on the same SQL Server instance. I’ve seen people use Merge with linked servers, but I wouldn’t recommend it.

Another factor that might give you some pause is that these statements are fairly complex and wordy. A long merge statement, such as merging two tables with 25 fields each, is tedious to write and it’s very easy to make a simple mistake. Well, stay tuned because later in this series I’ll share some code which will practically write the statements for you.

In the next article in this series we’ll discuss how to use the Merge statement with the Output clause as it’s required to load Slowly Changing Dimensions.

Related posts:

18 comments to Writing T-SQL Merge Statements the Right Way

Leave a Reply to I am

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>