Archives

What is the T-SQL Merge Statement and How do you use it?

I mentioned in a previous article that many posts on T-SQL Merge read like Books Online. Speaking of Books Online, let’s start by glancing at the syntax portion of the T-SQL Merge Page. I’ll take the liberty of re-posting just the first 25% or so below.

[ WITH [,...n] ]
MERGE
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] [ WITH ( ) ] [ [ AS ] table_alias ]
USING
ON
[ WHEN MATCHED [ AND ]
THEN ] [ ...n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND ]
THEN ]
[ WHEN NOT MATCHED BY SOURCE [ AND ]
THEN ] [ ...n ]
[ ]
[ OPTION ( [ ,...n ] ) ]
;

::=
{
[ database_name . schema_name . | schema_name . ]
target_table
}

::=
{
{ [
[ ,...n ] ]
[ [ , ] INDEX ( index_val [ ,...n ] ) ] }
}

Simple right? Great, I guess I’m done here…. No seriously, who can readily absorb that? So, what is Merge really and how do we use it?

T-SQL Merge Basics?


In a nutshell, the Merge statement allows you to Insert, Update, or Delete data in a Table, referred to as the Target, with data from another Table called the Source. The tables are compared on Fields which uniquely identify records in each, a Join if you will. That’s not to say that the Source has to be an actual Table, a view or a Select statement may be used as a source as well.

I think most people learn best from examples, by doing rather than reading, so I’ve provided a short script to create the tables required for the following example.
Slowly Changing Dimension Table Setup Scripts Jedi 1.4

Basically I’ve created two tables called DimCustomer and CustomerStaging with the following data in them:

Now, we’re using a CustomerStaging table for sake of simplicity. However, I could have used a T-SQL query from an OLTP database or other source as the Source object in my merge. For this article, let’s assume there are no Slowly Changing Dimensions and ignore the fields in each table to the right of Affiliation.

There are several differences between Staging and DimCustomer which should be noted:

  • Anakin Skywalker is no longer in the source system.
  • Yoda’s name was changed.
  • Obi-Wan Kenobi changed planets.
  • Darth Vader was added.


Now I realize that Anakin became Darth Vader, which could have been seen as a change in name. However, Obi Wan clearly states that Darth Vader betrayed and murdered Anakin, effectively becoming a new person. If that bothers you, then you’re a scruffy looking nerf herder. Anyway, if we used separate T-SQL statements, we’d delete Anakin, Update Yoda and Obi-Wan, and Add Darth Vader. With Merge, we can do that in one transaction. For me, the easiest way to learn something is to go through an example. We’ll break the statements into sections to make it easy to understand.

First, the Target and Source tables are specified along with the business key which identifies each record. This is the field that one would use in a join.

MERGE  DimCustomer AS Target
 USING CustomerStaging AS Source
    ON Target.CustomerKey = Source.CustomerKey

The When Matched clause determines what will happen when records exist in the Source and Destination with the same CustomerKey. Notice the additional conditions I’ve added. Strictly speaking, they aren’t required, but every record in the target would be updated regardless of need, which wastes resources.

WHEN MATCHED  
                AND (   Target.CustomerName <> Source.CustomerName -- These are optional. 
                     OR Target.Planet <> Source.Planet
                     OR Target.Affiliation <> Source.Affiliation)
THEN
   UPDATE SET --Updates Yoda and Obi-Wan records.
      Target.CustomerName = Source.CustomerName,
      Target.Planet = Source.Planet,
      Target.Affiliation = Source.Affiliation

The When Not Matched by Target clause specifies what should be done with records in the Source that aren’t in the destination. The typical scenario is to insert records which are new. I could have added additional conditions or only added certain new records as well.

WHEN NOT MATCHED BY TARGET
THEN
   INSERT (CustomerKey, -- Inserts Darth Vader
           CustomerName,
           Planet,
           Affiliation)
   VALUES (CustomerKey,
           Source.CustomerName,
           Source.Planet,
           Source.Affiliation)

The When Not Matched by Source clause specifies what should be done with records in the Target that aren’t in the Source. Keep in mind that if your Staging table isn’t comprehensive, maybe the result of an incremental extraction, then you’d want to omit this portion of the statement.

WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes Anakin

Also, keep in mind that any Merge statement must be terminated in a semicolon. So, when you put your script together, it looks like the following. Go ahead and run it on your test data.

MERGE  DimCustomer AS Target
 USING CustomerStaging AS Source
    ON Target.CustomerKey = Source.CustomerKey
WHEN MATCHED
                AND (   Target.CustomerName <> Source.CustomerName
                     OR Target.Planet <> Source.Planet
                     OR Target.Affiliation <> Source.Affiliation)
THEN
   UPDATE SET
      Target.CustomerName = Source.CustomerName,
      Target.Planet = Source.Planet,
      Target.Affiliation = Source.Affiliation
WHEN NOT MATCHED BY TARGET
THEN
   INSERT (CustomerKey,
           CustomerName,
           Planet,
           Affiliation)
   VALUES (CustomerKey,
           Source.CustomerName,
           Source.Planet,
           Source.Affiliation)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

What about Null Values?


Ah, very astute of you to notice that my young padawan. You’ll notice in the When Matched portion that I also check to see if a value changed before I run my update statement. However, depending on your settings, Null values are not equal to each other. Therefore, any record with a Null value would be updated even if there were no changes. I handle this with Coalesce because it chooses the best datatype between arguments based on datatype precedence. Therefore, I can use the same basic Coalesce statement for most comparisons without throwing an error. Therefore, the When Matched portion from above becomes the following:

WHEN MATCHED 
                AND (COALESCE(TARGET.CUSTOMERNAME, '') <> COALESCE(SOURCE.CUSTOMERNAME, '') 
                  OR COALESCE(TARGET.PLANET, '') <> COALESCE(SOURCE.PLANET, '')
                  OR COALESCE(TARGET.AFFILIATION, '') <> COALESCE(SOURCE.AFFILIATION, ''))

Limitations?


Of course, there are always limitations. The most important limitation is that both data sources need to be on the same SQL Server instance. I suppose you could use linked servers, but I wouldn’t recommend it.

Another factor that might give you some pause is that these statements are fairly complex and wordy. A long merge statement, such as merging two tables with 25 fields each, is tedious to write. Well, stay tuned because later in this series I’ll share some code which will practically write the statements for you.

In the next article we’ll discuss how to use the Output clause with Merge, and how I use it occasionally to perform auditing during data warehouse loads.

Any questions?

Related posts:

7 comments to What is the T-SQL Merge Statement and How do you use it?

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>