For those who don’t know, I’m currently the Data Warehouse Architect for a large scale and complex multi-tenant data warehousing project. Fortunately, the project is being created in SQL Server 2012 and I thrilled to be using the new features. Let me preface this blog post by admitting that my experience with using the SSIS 2012 Catalog, SSISDB, is fairly limited, but then again most people are in the same boat since adoption of 2012 has been fairly slow.
However, I’ve run into some problems which aren’t getting a lot of attention yet, but I’m fairly certain they will in the near future. Let’s break them down.
Initial Catalog Setup Issues
I won’t explain step by step how to create the Integration Services Catalog, because others already have. However, you’ll notice from the following screenshot that very few settings are available when the catalog is created.
I wish that Microsoft would upgrade the Create Catalog screen to enable the user to specify SSISDB settings upon creation. For example, right click the SSISDB under Integration Services Catalogs, and select Properties. The following settings can and should be set:
- Whether to Clean Logs Periodically – Defaults to True
- Retention Period in Days – Defaults to 365
- Default Logging Level – Defaults to Basic
- Maximum Number of Old Versions Per Project – Defaults to 10
- Whether to Periodically Remove Old Versions – Defaults to True
I wish we could set those at create time. As I’ll discuss in a minute, the default values can be disastrous for many installations.
SSISDB Database Ignores Model Database Settings
For some reason, when you create the SSISDB Catalog, the corresponding SSISDB database ignores the Model database settings. It appears that no matter what my Model database is set to, SSISDB defaults to set file sizes for the Data File and Log, 10% Growth Rate, and Full Recovery Model. I’ve reached out to Microsoft, but haven’t received an answer as to why.
This has become problematic for us because we’ve deliberately set up our Model Databases so that Simple Recovery is used in our Development and QA Environments while Full Recovery is used in our Production environments. I’ve had to set up periodic checking on all of our servers looking for databases in full recovery mode that aren’t getting transaction log backups to avoid runaway T-Log growth issues.
The script I’ve used to re-set the SSISDB database is as follows:
ALTER DATABASE [SSISDB] MODIFY FILE ( NAME = N'data', SIZE = 512000KB , FILEGROWTH = 1048576KB )
ALTER DATABASE [SSISDB] MODIFY FILE ( NAME = N'log', SIZE = 512000KB , FILEGROWTH = 512000KB )
ALTER DATABASE [SSISDB] SET RECOVERY SIMPLE WITH NO_WAIT
SSISDB Performance Issues
I’ve come to the conclusion that the SSISDB Database will not scale at all. I became painfully aware of this because of a problem we had in a “pseudo” Production environment. Our first nasty surprise was T-Log growth, because the database incorrectly defaulted to Full Recovery mode which I fixed by shrinking the log. Before you mentally criticize me dear reader, please keep in mind this wasn’t an actual Production environment and the data being kept in SSISDB was inconsequential so far.
Another issue became apparent shortly afterward. We were running with the default settings of Basic level logging, 365 Days Retention, and 10 Versions of the project being kept. Our SSIS project is already fairly large with around 50 packages, but it was only being run periodically in this environment approximately once or twice per week. We are not even hitting this thing very hard. However, the SSISDB database size was growing rapidly anyway. In this particular situation, it had grown to 60GB. While I realize that 60GB is not very large, SSISDB performance slowed to a crawl. In any of our environments, when the SSISDB approaches 50GB the performance degrades dramatically. The first sign of this is that the built in reports take longer and longer to return data and eventually seem to stop returning data at all.
Unfortunately this pseudo Production server was set up with very little hard drive space, 100GB. So, I started receiving free space warnings after the SSISDB database reached 60GB.
I did a little research about reducing the size and performance issues with the SSISDB. I lowered the Retention Period to 7 days, set the Default Logging Level to None, and reduced the Maximum Number of Old Project Versions to 1. We don’t need to save old versions of our project as we are using TFS source control. You are using source control right?
Keep in mind that changing your logging level to None is not advisable in most production environments, however as of this time we are using a 3rd party product to monitor our packages so this isn’t an issue for us. I suspect that most of you will use the Basic level, but set the Retention Levels back and run the cleanup job often to avoid performance issues. Anyway I changed my settings using the following script.
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N'MAX_PROJECT_VERSIONS', @property_value=1
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N'RETENTION_WINDOW', @property_value=7
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N'SERVER_LOGGING_LEVEL', @property_value=0
Then I proceeded to start the “SSIS Server Maintenance Job” automatically generated when you create the catalog. Wow, that was a mistake.
Don’t Touch That Button
The job ran for 2 hours and blew up the transaction log even though it was in Simple Recovery Model. I then started receiving emails that the server had completely run out of space. When I checked the error log, I found an error indicating that the Log was out of space and a checkpoint record could not be written.
I immediately started researching and found this article about the cleanup job causing user packages to fail. If you read between the lines, you’ll see that the SSISDB is not properly indexed to handle the cleanup job. In my case, I had to drop the SSISDB, recreate it, and apply the script above to avoid similar issues in the future. Once again, this was not a huge problem for me because it wasn’t truly a Production environment and we rely on a third party product for our SSIS logging anyway. Don’t forget that even if you do change your settings, you must also schedule the clean up job to run regularly. If you allow too much history to be accumulated, you may suffer the same fate I did when running it.
My take on all of this is that SSISDB is not ready for prime time. I don’t think I’m being unfair with that statement. I do appreciate the fact that the SSIS Catalog is a great addition to SSIS 2012, but I expect to be able to use it in Production, running packages every 5 or 15 minutes, retaining months worth of history, and for it to perform regardless.
After writing this article I ran across this excellent article on SSIS 2012 Catalog Indexing, and time permitting I will implement his suggestions and report back. Are any of you having similar issues with your SSIS 2012 Catalog (SSISDB)?
I’m in for a busy week as I have back to back presentations this Wednesday and Thursday nights.
On Wednesday the 20th, the Forth Worth SQL Server User Group has asked me to present one of my favorites. I’m constantly revising this presentation as I encounter as I come across new mistakes to share. So, even if you’d seen it, there will still be new material to shock and amaze you.
Data Warehouse Mistakes You Can’t Afford to Make
Many data professionals understand the basics of Data Warehouse design, including Dimension and Fact Tables, slowly changing Dimensions, and the use of meaningless surrogate keys. However, it isn’t until you’ve created a dimensional model and put it into production, that you realize just how much of an impact seemingly trivial mistakes can make. They can hobble performance, allow inaccuracy, and perhaps worst of all, inhibit adoption and usage of the new system.
Learn how to avoid many common mistakes, from someone who’s made them and then found ways to correct them.
On Thursday the 21nd, I’ll be continuing my Data Warehouse Series for the North Texas SQL Server User Group.
Data Warehouse Series – Basics and Beyond
In a series of 1 hour sessions, I’ll cover both the basics and gotchas of Data Warehouse Development using the SQL Server 2012 toolset. Some of the things we’ll cover are:
- Basic methodology for evaluating business requirements/needs, obtaining user cooperation, and identifying source data quality issues.
- Data Warehouse basic concepts including Facts, Dimensions, Star and Snowflake Schemas.
- Deep dive into slowly changing Dimension types and how to handle them using T-SQL and SSIS.
- T-SQL Merge Fundamentals and how it can be used to load Dimension and Fact Tables. Yes, I swear the Merge Generation Code will be ready for the presentation.
- Understanding the three basic Fact Table types, how they are used, and loaded with SSIS.
- Date Dimension creation and configuration.
- Taking advantage of SSIS 2012 functionality improvements including Environments, Project Parameters etc.
- Strategies for SSIS and database optimization, insuring data integrity, etc.
And much, much more. Each session will build upon the previous sessions so they will be recorded and posted to my blog. Come watch this exciting and progressive series as I take you from padawan to Jedi Master.
I can’t wait to see my friends at the meetings.
“Shit!” I said, followed by even more graphic expletives. I lost you? Sorry, I forget that you don’t follow me around? Let me explain.
Like most people, I have a junk room filled with boxes and plastic storage bins. After Thanksgiving I was in the garage looking for our stored Christmas ornaments and came across a very large bin that was unlabeled. When I opened it, I found a ton of brand new clothes including many pairs of Dockers pants and all of them had the tags still on. To understand my anger we need to back up several years.
Clowning around, but not really happy.
Around 15 years ago, on my way to morbid obesity, I was outgrowing my clothes much faster than they wore out. During the rise, I bought a bunch of Dockers Slacks and other clothes for work because I got them on sale. However, because my weight rose so fast, I had pairs of them that were never worn. I never even took the tags off. I could have returned them, but I was embarrassed because of the reason. Those of you who know me, know that I’m unfailingly honest. I couldn’t look a clerk in the face and tell them that I had to return the pants because I was too fat to wear them. For these all of these years I have kept them in a plastic bin, telling myself that I would lose weight and wear them again someday.
I carted those pants from apartment to apartment. I had garage sales and didn’t sell those pants. Even as I gave away half my stuff when I packed up everything to move across country years ago, I kept that damn bin. I knew it would be feel great to be able to wear them. However, I lost track of it in my latest move.
So, why the expletives? Well, in case anyone doesn’t know I had weight loss surgery back in April. I figured that as my weight dropped I would be able to use the brand new Dockers (and other clothes) from that bin. However, by the time I found it they were already several sizes too large for me anyway. Sometime in December I passed 100 lbs of weight lost. So, my immediate reaction was anger that I had wasted the money, time, and effort carting around those stupid pants only to never get to wear the damn things.
Then, “Bang” I felt like something hit me upside the head.
Who’s this guy? – Photo courtesy of Tim Mitchell
What the heck was I thinking? I have so many things for which I should be grateful. I’ve lost a hundred pounds, 10 inches from my waist, and feel great. Weight Loss Surgery saved my life. I would absolutely do it all over again, and again, and again if necessary. I’ve reached a point in my career where money isn’t a problem, who cares if I had to donate a couple of hundred dollars in clothes to Goodwill? I’ve already donated thousands of dollars of clothes as I’ve gone down through the sizes. There are lots of people in my #SQLFamily
who care and I’m surrounded by great people at the North Texas SQL Server User Group
I have been working for the past several months for a company in Carrollton, Texas on a fascination multi-tenant data warehouse project as the Data Warehouse Architect. The project is amazing, we’re using all of the best toys (tools), and I’ve brought in some of the very best consultants to assist with it. I have my SQL Family to thank for that. More on that in a later article.
Anyway, life is too short to dwell on meaningless things like Dockers pants. Life is good.
This is the first year that my primary resolution is NOT to lose weight, though I intend lose another 20 to 40 pounds this year. I’m back in the gym and have been investigating Aikido Dojos in the area. I have Steve Jones (Blog/Twitter) to thank for pushing me back toward the art. I can tell you that if he likes you, the man is relentless. So, I promise to be attending one before the end of January.
I want to start getting certified in the BI Stack. I’m calling myself out right now in front of all (both) of my readers. I will get at least 3 certifications this year and will shoot for more.
Regardless of my difficult work schedule, I’ll take more time to blog and present this year. In fact, I’ve just agreed to present an ongoing series at the North Texas SQL Server User Group meetings called, “Data Warehouse Series – Basics and Beyond” and I hope to see some of you there. I love giving back to my #SQLFamily and need to do as much as I can this year. Mentoring others in our profession makes me feel great.
Having shed almost a full person, I have more energy than ever. Watch out folks, you ain’t seen nothing yet.
Those who read this blog know that I’ve always been into the PASS Community, particularly about attending the PASS Summit. This is my third and I hope never to miss one.
I’ve written articles about why it’s worth paying your own way, how going is like finding a golden ticket and even like going on vacation. Many people, including my “real” family don’t really understand why I love my SQL Family (#sqlfamily) so much.
Changes and The Future
I’m particularly geeked about this one because so much has changed in the past year. I’ve lost almost 100 pounds and feel so much better now. Also I’ve switched jobs and am now working as a corporate employee on a huge data warehousing project which is unlike anything I’ve ever seen before. Last night I went to dinner with most of the authors of SQL Server 2012 Integration Design Patterns, shared a few tidbits of the project, and they agreed that it was something special. More on the book and my current project to come in a future article.
A Scare Brings Focus
Anyway, a few recent things really brought me back to why I am so committed to this community. I thought that I had lost the blog a few weeks ago. One of my readers notified me that my site was down and I had no idea how long it had been down. My provider did not respond immediately to my email and when they did the response was something like, “It’s down? Oh, I’d better check that out.”
Because it’s a hosted site, my backup/restore options are limited, and I had never performed a disaster recovery test of the blog. To be honest, I wouldn’t know how to. I was surprised at my visceral reaction when I thought I had lost it, even though I’ve been neglecting the blog for the past year. It turned out that the problem wasn’t data loss, but rather a Denial of Service Attack which my provider was unaware of. Anyway, everything turned out alright, but it reminded me of how important the blog and SQL Community are to me and I intend to make it a priority in the future.
Another reminder came after the latest SQL Saturday for the North Texas SQL Server User Group. For those who don’t know, I recently gave a precon for SQL Saturday Dallas. It went extremely well because I really love presenting, and I eagerly read all of the Session Evaluations I received. I don’t think that your average attendee realizes how much we care about what they have to say and I love pouring over the comments to find ways to improve. One attendee said something which really touched me and made me re-focus on my community involvement.
“It can be easily seen that David has found his calling in life. He is incredibly knowledgeable and passionate about Data Warehousing. I only wish we had more time for a deeper dive on some topics…”
Well, this person really nailed it. I have found my calling and love presenting, blogging, and volunteering in the PASS Community. I am fascinated by Data Warehousing, Integration, and Analytics. There’s something special about sharing knowledge with your fellow database pros (and friends) which feels so good that it’s worth all of the effort.
Last night at dinner, Tim Mitchell (Blog/Twitter) summed up the feeling very well. He’s a rabid Rangers baseball fan and therefore used a baseball metaphor. I’m paraphrasing but he likened his precon experience to the feeling when a hush comes over a crowd, the batter makes solid contact with the ball, the “crack of the bat” echos through the stadium, and it’s a home run. You just know when you hit it out of the park, and it’s one of the best feelings ever.
When I’m at the top of my game, this is exactly how it feels. When you are connected to the audience, you can feel that they are learning, are entertained, and most importantly you’ve made a difference. You know that they’ll take what you’ve taught and go back to their jobs to make things better.
Join the SQL Family #SQLFamily
Well, I hope you’ll forgive my rambling. It’s 4am as I write this, and I don’t adjust well to timezone changes. The official conference starts tomorrow and I can barely contain my excitement. It’s great seeing all of my friends again, some of whom I haven’t seen since last year. If you see me out and about, and manage to recognize me since I’ve lost almost 100 lbs, please stop me even if we’ve never met before.
I’ve already met some folks who I’m sure will become part of my #SQLFamily, and am always looking to connect with more.
If you’re not here, then make plans for next year. You cannot beat the networking and educational experience you get at a PASS Summit.
Every time I attend SQL Saturdays I am re-energized, and SQL Saturday 163 BI Edition was no different. This one was particularly special because of my precon presentation, “Data Warehousing In a Day”. I just love getting up in front of a group and teaching and it’s obvious when to the attendees.
If all of the emails I’ve received are any indication, the attendees enjoyed it as much as I did. I want to thank those who attended the precons (not just mine) and attended SQL Saturday. You folks make the PASS Community worthwhile.
I also want to thank our team of volunteers who put on the event.
As anyone who reads my blog knows, my time has been ridiculously scarce lately and I wasn’t able to do much more than get my presentations ready. It’s nice to know that if I can’t give my usual effort that everyone else picks up the slack. Everything went off great. Congrats guys.
Presenting “Dimension – It’s All Meaningless Without Them”
Anyway, as promised the following are available for download:
Source Database Backup, Scripts, and SSIS 2012 Packages
Building an ETL Automation Framework for SSIS 2008 – Rushabh Mehta
Once again folks, please accept my sincere apologies that I haven’t made enough time for my community work, especially blogging. I just recently severed ties with a project that was taking all my time and preventing me from blogging. That, and another event I’ll talk about soon, reminded me how important the SQL Community is to me. I will endeavor to make this the last time I need to apologize.
Anyway, coming up in future posts I’ll cover the following:
- More on the Merge Statement including a wrinkle I ran into last week regarding foreign keys.
- The Balanced Data Distributor Transform for SISS.
- A series of articles regarding Dimensions, SCDs, and how to load them in SSIS 2008 through SSIS 2012.
This is just a quick reminder that I and two of my friends are presenting Precons tomorrow for SQL Saturday Dallas. While I am biased and think that you should sign up for mine, any of these sessions are a huge educational value. If you’re in the DFW area, do yourself a favor and attend a Precon and SQL Saturday. The abstracts are below.
Hope to see you there!
Become an Enterprise DBA by Sean McCown
Speaker Bio: Sean McCown is a Microsoft Certified Master in SQL Server 2008. He is also a SQL Server MVP with over 15yrs experience in databases. Sean is also founder and co-owner of the renowned website MidnightDBA.com where he records free SQL Server training videos and co-hosts the popular Webshow DBAs@Midnight
In addition to this training, Sean has decided to offer all of the Enterprise DBA precon attendees a FREE copy of the DBA Roadmap webinar! Which means, of course, that the precon pays for itself…or you get both for 50% off, or however you want to look at it. Bottom line: attend Sean’s precon, learn massive amounts of stuff, AND improve your career exponentially. Win-win-win.
Data Warehousing in a Day by Dave Stein
Speaker Bio: David Stein is a Senior Business Intelligence Consultant with WaveTwo in Irving, TX. He’s worked in SQL Server Business Intelligence for almost a decade. He’s designed Data Warehouses from scratch as well as saved projects in serious jeopardy. He is also very active in the community, having spoken at PASS events at all levels, including the PASS Summit, SQL Rally, and at numerous SQL Saturday events and user groups. He’s also given multi-day pre-conference seminars at third party events regarding Microsoft Business Intelligence. You can learn more about Data Warehousing, Business Intelligence, and other technical topics on his blog at Made2Mentor.com or follow him on Twitter @Made2Mentor
Attend this full day pre-con and let David be your guide through this process. He’ll save you time, money, and headaches.
Additional information about my presentation can be found on my blog at: http://www.made2mentor.com/2012/06/sql-saturday-dallas-and-pre-cons/
Real World SSIS: A Survival Guide by Tim Mitchell
Speaker Bio: Tim Mitchell is a business intelligence consultant, database developer, speaker, and trainer. He has been working with SQL Server for over 9 years, working primarily in business intelligence, ETL/SSIS, database development, and reporting. He has earned a number of industry certifications and holds a Bachelor’s Degree in Computer Science from Texas A&M at Commerce, and is a Microsoft SQL Server MVP. Tim is a business intelligence consultant for Artis Consulting in the Dallas, Texas area.
As an active member of the community, Tim has spoken at national, regional, and local venues including the SQL PASS summit, SQL Connections, numerous SQL Saturday events, and various user groups and PASS virtual chapters. He is a board member and speaker at the North Texas SQL Server User Group in Dallas, serves as the co-chair of the PASS BI Virtual Chapter, and is an active volunteer for PASS. Tim is an author and forum contributor on SQLServerCentral.com and has published dozens of SQL Server training videos on SQLShare.com.
Tim is a contributing author on the charity book project MVP Deep Dives 2, and is coauthoring SSIS Design Patterns (available July 16th).
You can visit his website and blog at TimMitchell.net or follow him on Twitter at @Tim_Mitchell.
Let me just start this post with an apology to all of my readers, well both of you. There have been a lot of changes in my life lately and I’m been overwhelmed with demands on my time. I won’t go into details but those demands will be letting up soon, and I will be able to get back to blogging, community involvement, and catching up on sleep.
However, this topic is important enough that I had to take a break from frantic schedule to jot down a few words about it. As I started to write this, I got a notification of a blog article by my friend and fellow North Texas SQL Server User Group member, Ryan Adams (Blog/Twitter). As I read his article I realized that our views were nearly identical about two candidates, Sri Sridhara (Blog/Twitter) and Allen Kinsel (Blog/Twitter). I urge you to read what he wrote because I echo each and every one of his sentiments.
However, I’d like to say just a few words about each candidate myself.
I met Allen at my first SQL Saturday of which I was an active volunteer. Allen was the first PASS member that I connected with who wasn’t in one of my local groups. The guy is tireless, has a pragmatic point of view about the community, cares about member’s opinions, and I’ve never heard anyone have a negative thing to say about him. When I have something to comment (whine about) on regarding PASS at the national level Allen is the guy I go to and I can tell you that he truly cares… or at least is good at acting like he does.
Sri is an interesting PASS leader and volunteer. I’ve worked with him for several years organizing PASS events like SQL Rally, multiple SQL Saturdays, etc. In some ways, he’s a lot like me. Some people take to him, some not so much, but what I’ve found is that people in both camps are reacting to the same thing. The man is highly driven. Nobody can question his dedication, drive, and single minded focus on pushing the community farther. Sri embodies what PASS is and should continue to be, relentlessly driven to spread the adoption of SQL Server and to raise the ability and skill set of DBAs everywhere.
He and I don’t always agree, but he’s a guy you can count on to always follow through on his commitments, to do what he says he will do. Without belaboring the point, what happened last year was shameful, yet to this day Sri hasn’t had a negative thing to say about it, even privately. Let’s not let that happen again, Vote for Sri.
Like Ryan said, I’m not commenting on the other candidates, but not because they aren’t good people. I just happen to know these two exceptionally well, and intend to vote for them. You should too.
This is just a friendly plug for my friends at RedGate regarding SQL in the City, a free series of events traveling around the country. It’s a free day of training and networking led by spectacular presenters such as Steve Jones (Blog/Twitter) and Grant Fritchey (Blog/Twitter), who will be headlining the Austin event next week.
Steve and I at my first Summit
Grant and I at SQL Sat. Chicago
I had originally planned on going myself, but I recently got a new gig (more on that in the near future) and can’t get away at the moment. However, these guys are top notch, and you really shouldn’t miss the chance to learn from them.
So, Register Today!
Hey folks, just a quick message that I’ll be presenting this evening at North Texas SQL Server User Group. I’m presenting an updated version of “Have you got the Urge to Merge?” which I gave last month in Fort Worth. I’ve made good progress with my code which generates T-SQL Merge Statements and will be able to show more of it tonight. I hope to post a beta version of the code next week. The abstract is below.
Have you got the Urge to Merge?
The T-Sql Merge statement, introduced in SQL 2008, is powerful, versatile, and performant. However, it is also one of T-SQL’s least understood statements. Most DBAs know they can use It to insert, update, and delete data in a target table in a single transaction, generally outperforming separate statements. Fewer know, however, that it can be used to load slowly changing dimensions, facilitate auditing, and for many other uses. Come to the user group meeting and learn the in and outs of T-SQL Merge. I’ll even show you how to write complex Merge statements with lightning speed.
I hope to see some of you folks in Irving tonight.
I mentioned in a previous article that many posts on T-SQL Merge read like Books Online. Speaking of Books Online, let’s start by glancing at the syntax portion of the T-SQL Merge Page. I’ll take the liberty of re-posting just the first 25% or so below.
[ WITH [,…n] ]
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] [ WITH ( ) ] [ [ AS ] table_alias ]
[ WHEN MATCHED [ AND ]
THEN ] [ …n ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND ]
[ WHEN NOT MATCHED BY SOURCE [ AND ]
THEN ] [ …n ]
[ OPTION ( [ ,…n ] ) ]
[ database_name . schema_name . | schema_name . ]
[ ,…n ] ]
[ [ , ] INDEX ( index_val [ ,…n ] ) ] }
Simple right? Great, I guess I’m done here…. No seriously, who can readily absorb that? So, what is Merge really and how do we use it?
T-SQL Merge Basics?
In a nutshell, the Merge statement allows you to Insert, Update, or Delete data in a Table, referred to as the Target, with data from another Table called the Source. The tables are compared on Fields which uniquely identify records in each, a Join if you will. That’s not to say that the Source has to be an actual Table, a view or a Select statement may be used as a source as well.
I think most people learn best from examples, by doing rather than reading, so I’ve provided a short script to create the tables required for the following example.
Slowly Changing Dimension Table Setup Scripts Jedi 1.4
Basically I’ve created two tables called DimCustomer and CustomerStaging with the following data in them:
Now, we’re using a CustomerStaging table for sake of simplicity. However, I could have used a T-SQL query from an OLTP database or other source as the Source object in my merge. For this article, let’s assume there are no Slowly Changing Dimensions and ignore the fields in each table to the right of Affiliation.
There are several differences between Staging and DimCustomer which should be noted:
- Anakin Skywalker is no longer in the source system.
- Yoda’s name was changed.
- Obi-Wan Kenobi changed planets.
- Darth Vader was added.
Now I realize that Anakin became Darth Vader, which could have been seen as a change in name. However, Obi Wan clearly states that Darth Vader betrayed and murdered Anakin, effectively becoming a new person. If that bothers you, then you’re a scruffy looking nerf herder. Anyway, if we used separate T-SQL statements, we’d delete Anakin, Update Yoda and Obi-Wan, and Add Darth Vader. With Merge, we can do that in one transaction. For me, the easiest way to learn something is to go through an example. We’ll break the statements into sections to make it easy to understand.
First, the Target and Source tables are specified along with the business key which identifies each record. This is the field that one would use in a join.
MERGE DimCustomer AS Target
USING CustomerStaging AS Source
ON Target.CustomerKey = Source.CustomerKey
The When Matched clause determines what will happen when records exist in the Source and Destination with the same CustomerKey. Notice the additional conditions I’ve added. Strictly speaking, they aren’t required, but every record in the target would be updated regardless of need, which wastes resources.
AND ( Target.CustomerName <> Source.CustomerName -- These are optional.
OR Target.Planet <> Source.Planet
OR Target.Affiliation <> Source.Affiliation)
UPDATE SET --Updates Yoda and Obi-Wan records.
Target.CustomerName = Source.CustomerName,
Target.Planet = Source.Planet,
Target.Affiliation = Source.Affiliation
The When Not Matched by Target clause specifies what should be done with records in the Source that aren’t in the destination. The typical scenario is to insert records which are new. I could have added additional conditions or only added certain new records as well.
WHEN NOT MATCHED BY TARGET
INSERT (CustomerKey, -- Inserts Darth Vader
The When Not Matched by Source clause specifies what should be done with records in the Target that aren’t in the Source. Keep in mind that if your Staging table isn’t comprehensive, maybe the result of an incremental extraction, then you’d want to omit this portion of the statement.
WHEN NOT MATCHED BY SOURCE THEN DELETE; -- Deletes Anakin
Also, keep in mind that any Merge statement must be terminated in a semicolon. So, when you put your script together, it looks like the following. Go ahead and run it on your test data.
MERGE DimCustomer AS Target
USING CustomerStaging AS Source
ON Target.CustomerKey = Source.CustomerKey
AND ( Target.CustomerName <> Source.CustomerName
OR Target.Planet <> Source.Planet
OR Target.Affiliation <> Source.Affiliation)
Target.CustomerName = Source.CustomerName,
Target.Planet = Source.Planet,
Target.Affiliation = Source.Affiliation
WHEN NOT MATCHED BY TARGET
WHEN NOT MATCHED BY SOURCE THEN DELETE;
What about Null Values?
Ah, very astute of you to notice that my young padawan. You’ll notice in the When Matched portion that I also check to see if a value changed before I run my update statement. However, depending on your settings, Null values
are not equal to each other. Therefore, any record with a Null value would be updated even if there were no changes. I handle this with Coalesce because it chooses the best datatype
between arguments based on datatype precedence
. Therefore, I can use the same basic Coalesce statement for most comparisons without throwing an error. Therefore, the When Matched portion from above becomes the following:
AND (COALESCE(TARGET.CUSTOMERNAME, '') <> COALESCE(SOURCE.CUSTOMERNAME, '')
OR COALESCE(TARGET.PLANET, '') <> COALESCE(SOURCE.PLANET, '')
OR COALESCE(TARGET.AFFILIATION, '') <> COALESCE(SOURCE.AFFILIATION, ''))
Of course, there are always limitations. The most important limitation is that both data sources need to be on the same SQL Server instance. I suppose you could use linked servers, but I wouldn’t recommend it.
Another factor that might give you some pause is that these statements are fairly complex and wordy. A long merge statement, such as merging two tables with 25 fields each, is tedious to write. Well, stay tuned because later in this series I’ll share some code which will practically write the statements for you.
In the next article we’ll discuss how to use the Output clause with Merge, and how I use it occasionally to perform auditing during data warehouse loads.