EMAG TRESNI and Data Warehouse Implementation

Those who follow this blog know that I’m a huge video game geek. I have a life-sized Mario statue and a fairly large video game collection which includes about 700 original complete in box NES Games.



I also own almost 40 books about video game history. One of the best books I’ve read is The Ultimate History of Video Games: From Pong to Pokemon–The Story Behind the Craze That Touched Our Lives and Changed the World by Steven Kent. Even if you only casually played video games back in the day, it’s a great read.

In the book, Kent discusses the design of the Atari Lynx handheld, which attempted to compete with Nintendo’s Game Boy. I’m sure most of my readers have never heard of the Atari Lynx since it was crushed by the Game Boy. The Lynx received rave reviews, had a much better full color screen, and the ability to link together for multi-player. Even though the Game Boy was antiquated and completely out-classed, Lynx lost the battle because Nintendo was much better at marketing than Atari by this time. There’s an important lesson in there, but we’ll leave that for another day.

Atari Lynx

Atari Lynx

On page 418, Kent relates a story about the inherent problems when designing complex systems like the Lynx. The engineering teams, despite being meticulous in their planning and documentation, made one minuscule error in the binary coding. When the Lynx was powered up, if there wasn’t a cartridge in the slot, the message “Insert Game” should have been displayed. Because of this one mistake, the screen displayed “EMAG TRESNI” or “Insert Game” backwards. The first version of the Lynx systems actually shipped this way. “EMAG TRESNI” became an inside joke for the team, because I guess it’s cooler than referring to Murphy’s Law.

Data Warehouse Implications

What does this have to do with Data Warehouse Implementation? At the time I read this, I was in the middle of a particularly hairy Data Warehouse implementation. The business leaders demanded a data warehouse as fast as possible, and were unwilling to dedicate the necessary time in the investigative phase of the implementation. I’m referring to steps like:

  • Gathering Business Requirements
  • Determining Technical Requirements and Designing Architecture
  • Data Source Quality Testing

You know… Minor things like that. These steps can take a very long time, but they are absolutely necessary and rushing through them almost certainly guarantees project failure. Further, implementing an appropriate ETL Framework is essential. For example, if you don’t plan ahead for SQL Server Integration Services (SSIS) package failure, due to data errors and whatnot, you’ve set yourself up for a rude awakening some night at 3 am scrambling to correct a botched load. While I’ve never designed a piece of hardware like the Lynx, I have worked on some incredibly complex Data Warehouse implementations, and I can confidently say that EMAG TRESNI runs rampant in our efforts as well.

Until fairly recently, making changes to the dimensional model could be disastrous because making the necessary changes to the ETL was so time consuming and painful. How many times have you been in the process of implementation, had to make a change (for business or technical reasons), and then had to alter data structures such as stored procedures, views, etc. Even worse than that, the process to open all the relevant SSIS packages, often just to open specific tasks to refresh meta data, kills productivity. Let’s not forget that all that code needs to be check out and back into source control, quality assurance testing is necessary, etc.

Did you notice how I prefaced the previous paragraph with “Until fairly recently?” Well, I’ve been pursuing a mastery of BIML Scripting. I’m using it, along with my existing design patterns, to automate my Data Warehouse Development. Yes, I said automate. Stay tuned, as I’m about to begin sharing with you how BIML Scripting will change everything.

What exactly are Dimensions and Why do They Slowly Change?

As I was writing the articles on handling Slowly Changing Dimensions with T-SQL Merge, I realized that some background on Dimensions may be necessary first. Astute readers may remember that I briefly covered Dimension table basics a few years ago, but we need to go into a bit more detail on how to build them and how their structure supports Slowly Changing Dimensions.

What is a Dimension?

In general, Dimensions are tables in your data warehouse, composed of wide and descriptive columns, which give your data meaning. Some in the industry refer to the columns of a Dimension Table as “Dimensions” as well. The entities described by Dimension Tables are often physical in nature such as Customers, Patients, Products, Stores, etc. Identifying Dimensions and their attributes is easy when you know the trick. When you were in school, did you learn to do Word or Story Problems in Math class? I was taught that an important step in doing so was to translate the question into math operations by recognizing and translating key words.

Luckily spotting Dimensions is even simpler. Consider this story problem that your boss may give you:
“I want to know net sales in dollars by Product, Territory, and Month.”

When you hear this, you should immediately look for words like “by” or “per.” In this case, Product, Territory, and Month are Dimensions or attributes of them. Easy huh?

Remember that the primary driving force behind Data Warehouse initiatives is to simplify data analysis. In general, the fewer tables we have in our Data Warehouse, the easier analysis will be. Therefore, we combine similar attributes into Dimensional tables which tend to be denormalized and flattened. In the Story Problem above, you might have the following Dimension Tables.

  • DimProduct, which may be composed of fields like Product Name, Product Category, etc.
  • DimSalesTerritory, which may have fields like Territory Name, Territory Code, etc.
  • DimCalendar, the most common Dimension, which of course would have fields like Month Number, Month Name, Quarter, Year, Day Of Year, Day Of Week Name, etc.

Dimension Table Structure

Almost without exception, every Dimension should have a primary key which has no meaning, essentially a numbering column. The users users of a Data Warehouse won’t care about these Primary Keys, and in fact may never even see them. However, something like a Customer Number attribute has meaning, and points back to a specific customer. These meaningless keys, called Surrogate Keys, insulate the data warehouse from any prior business systems, they are typically an integer data type, and typically assigned by SQL Server through the use of Identity property. Basically as each new record is added to a Dimension, the next number is assigned to it. Dimensions join to related Fact tables, where the dollar amounts and other measures are kept, via the Surrogate Keys. These table groups form what is called a Star Schema which you can see below. 

Dimensions in Blue, Fact is Green

Dimensions in Blue, Fact in Green, joined by Surrogate Keys.

So, in more detail Dimensions need to use Surrogate Keys for the following reasons.

  1. The source database(s) may recycle its keys. If they do so, the uniqueness of your dimension primary key is violated.
  2. Multiple sources for the same dimensional entity likely have different keys. If you extract Customer information from a Customer Relationship Management system, Shipping Software, and your Billing System, an additional key field is necessary to unify it.
  3. If you acquire another company, which has it’s own set of keys, how could you incorporate them into your data warehouse without Surrogate Keys?
  4. Suppose one of your attributes is a ship date and that isn’t always known the first time a record is loaded? Surrogate keys allow a special value to be substituted instead like “Not Available” rather than a null.
  5. Surrogate Keys are compact which saves space, especially with indexes. To that end, you should be using the smallest integer datatype appropriate for these keys.
  6. Surrogate Keys facilitate easier tracking of historical changes. More on this in a bit.

In addition to the Surrogate Key, a Dimension will have one or more Business Keys, sometimes also referred to as Alternate Keys. These are fields, usually numeric or alpha-numeric (codes), from the source system(s) which identify each entity. Some Dimensional Modelers indicate the type of key in the field name. So, in a Customer Dimension you may see fields named Customer_SK (Surrogate Key), Customer_BK (Business Key, or Customer_AK (Alternate Key). Personally, I’m a big proponent of using naming conventions religiously. Unless I’m forced to deviate, my standard method is to name the Dimension DimCustomer, the surrogate key becomes CustomerKey, and the business keys are named exactly as the user understands them. It could be CustomerId, CustomerNumber, etc. One reason I do this is I am unlikely to run into a source which uses the exact naming convention which I use for my surrogates, and I dislike having to answer the same questions over and over again. “Umm… what does Customer_AK mean again?” I would urge you to follow a standard, but as long as you’re consistent feel free to create your own conventions.

Slowly Changing Dimension Maintenance Columns

Most Dimensions will have column(s) which handle SCD Changes. The columns you’ll typically find are some variation of RowisCurrent, RowStartDate, and RowEndDate. Not everyone uses the same fields, and some only use a subset of them. Jamie Thomson (Blog/Twitter) makes a very well reasoned argument for only using RowStartDate in his “Debunking Kimball Effective Dates” Part One and Part Two. I’ll weigh in on this in a future article, but for the sake of this discussion, I will be using all three.

  • RowIsCurrent – In my implementation this a Char(1) field which contains ‘Y’ or ‘N’. It’s also common for modelers to use a bit (1 or 0) for this same purpose. This simply indicates if a particular record is the Current or Active record for a particular entity.
  • RowStartDate – This is some form of a Date/DateTime column which indicates when the record first became current.
  • RowEndDate – This is some form of a Date/DateTime column which indicates when the record was no longer current or active. The row was expired at this point. For the active records, some modelers leave RowEndDate NULL. I choose to assign it to the maximum date for the data type. So, if I use datetime, I may assign this field as ’12/31/9999′.

Further, it should be noted that some prefer when expiring a record to set the RowEndDate to a tiny increment less than RowStartDate for the next record for this entity. They do this because they would prefer to write “Between” statements in their T-SQL. However, I have found that maintaining such a structure is problematic. Therefore, I almost always set the expired record’s RowEndDate and new record’s RowStartDate to be equal.

General Indexing Strategies

There is a lot of conflicting information on the topic of how to structure and index Dimension Tables. The most basic disagreement is whether the Primary Key of a Dimension should be clustered or non-clustered. In most examples, you’ll find the Primary Key being clustered as in Microsoft’s own AdventureWorksDW. Further, in their definitive work,
The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2, the Kimball group espouses the same practice.

Also, Microsoft’s own Customer Advisory Team (SQL CAT) indicates that Dimensions should have clustered Primary Keys.

I tend to deviate and create a clustered index on the business key(s), which means the Primary Key must be non-clustered. The following articles available on the web explain why this is a good idea with SQL Server, and my own personal testing supports this practice.

Regardless of which method you choose, be sure to index on the Business Key(s) and Slowly Changing Dimension columns.

Why do Dimensions Slowly Change?

One of the primary problems with source systems is that many of them don’t track history. For example, if a customer moves, many of these systems simply over-write the old address information in place. If you were reporting from this database, then any sales for this customer would also “move” to the new location, which may not be desirable. Slowly Changing Dimensions allow us to track history in the data warehouse. Since Dimension Tables join Fact Tables via the Surrogate key, we can have many records per business entity (customer in our case) with its history of Dimension Attributes.

It’s important to note that Dimension Tables themselves are rarely a specific type, rather we track the SCD type per Dimensional Attribute (column). Ralph Kimball, who many consider to be the father of Data Warehousing, creatively named the types of SCDs as Type 1, Type 2, etc.

  • Type 0 – Fixed attributes that shouldn’t change like a Customer’s birth date.
  • Type 1 – These columns don’t track history and simply over-write values in place when change is encountered.
  • Type 2 – A change in these columns causes expiration of the current row and a new row to be added with the newly changed value.

Fear not dear reader, if you are new to Dimensional Modeling the handling of these types will make sense with the screen shots provided below. There are additional types of Slowly Changing Dimensions, but they are beyond the scope of this article. The following articles by The Kimball Group explain Slowly Changing Dimensions in more depth:


So as you can see, our simplified Customer Dimension has only two attributes. CustomerName will be handled as Type 1 and Planet as Type 2. While our example is extremely simple, it is also “realistic.” In many real world systems, attributes like Customer Name are often Type 1; and Planet is analgous to a location or address, which are often processed as Type 2. Let’s see an example of SCD Changes in action. Suppose you had the following data in DimCustomer:

Now suppose that Obi-Wan Kenobi changed his name to Ben Kenobi before he went into hiding. Well, in our system, that’s a Type 1 change. So, after we process this change, we find the following:

Be sure to note the following:

  1. If there had been several records for Obi-Wan, then every one of them would be updated to “Ben Kenobi”.
  2. Notice that the SCD fields, RowIsCurrent, RowStartDate, and RowEndDate are un-affected.

Simple huh? Well, let’s move to Type 2. In our example, we only have one Type 2 column which is planet. For our Type 2 change, let’s say that Yoda escapes to Degobah after failing to destroy Emperor Palpatine. I have no idea why Obi-Wan and Yoda didn’t attempt to destroy him together later since Vader was beaten and incapacitated, which left the entire galaxy to live in tyranny for almost 2 decades. And if Obi-Wan loved Anakin, why on earth did he allow him to burn to death (as far as he knew) rather than mercifully ending him? Maddening. Umm, sorry I digress. So, how do we process a Type 2 change like this? The basic steps are as follows:

  1. Find the current record for each entity (per business key) in the Dimension. Most often this is done by finding the record where RowisCurrent = ‘Y’ or some variation. However, In certain situations, you may need to find the current record by comparing the change date to RowStartDate and RowEndDate.
  2. If there are changes to any of the Type 2 columns for that current row, expire it. This is typically done by setting the RowIsCurrent value to ‘N’ as well as assigning the RowEndDate equal to the change date. Again remember what I said previously that there are differing opinions on this.
  3. Insert a record with a new Surrogate Key for that Dimension record and insert all of the new values into it. The RowEndDate will be ’12/31/9999′, NULL, or some other value you choose.

So, what does this look like? Well, we know that the current row for Yoda will have a Type 2 change since he’s moving planets. So, we need to expire that row:
Followed by the insertion of the new record for Yoda:

As you can see, Yoda still has the same CustomerNum (Business Key), but now he has 2 surrogate key values. To illustrate how important this is, consider our Dimension along-side a sample Fact table.
Yoda’s sales are the last three in that Fact table. Those with the Customer Key of 2, will be properly attributed to Coruscant, while the final will be attributed to Degobah. That way, even if the source system updated the Planet in place, you still retain history in your data warehouse.

In the next article, I’ll show you how to properly process both Type 1 and Type 2 Dimension changes using T-SQL Merge.

Using the Output Clause with T-SQL Merge

The Output clause, first implemented in SQL Server 2005, can be used to return information for each row modified by an Insert, Update, Delete or Merge statement. This functionality greatly increases the power and usefulness of Merge, and is required in the processing of Slowly Changing Dimensions.

This post is the second in a series called Have You Got the Urge to Merge? and is a follow up to Writing T-SQL Merge Statements the Right Way. If you just happened upon this article, feel free to jump to the beginning and follow along through the entire series.

As usual, I feel the easiest way to learn something is by example. For the sake of simplicity, I’m going to continue using the same tables and code from the first article in the series. To set up, we’ll run the following code:

 USE TempDb;
IF OBJECT_ID ('tempdb..#Customer_Orig') IS NOT NULL DROP TABLE #Customer_Orig;
IF OBJECT_ID ('tempdb..#Customer_New')  IS NOT NULL DROP TABLE #Customer_New;
CREATE TABLE #Customer_Orig
(  CustomerNum    TINYINT NOT NULL
  ,CustomerName   VARCHAR (25) NULL
  ,Planet         VARCHAR (25) NULL);
CREATE TABLE #Customer_New
(  CustomerNum    TINYINT NOT NULL
  ,CustomerName   VARCHAR (25) NULL
  ,Planet         VARCHAR (25) NULL);
INSERT INTO #Customer_New (CustomerNum, CustomerName, Planet)
   VALUES (1, 'Anakin Skywalker', 'Tatooine')
         ,(2, 'Yoda', 'Coruscant')
         ,(3, 'Obi-Wan Kenobi', 'Coruscant');  
INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet)
   VALUES (2, 'Master Yoda', 'Coruscant')
         ,(3, 'Obi-Wan Kenobi', 'Coruscant')
         ,(4, 'Darth Vader', 'Death Star');
SELECT * FROM #Customer_Orig Order by CustomerNum;
SELECT * FROM #Customer_New Order by CustomerNum;

When you run the code above, you should have the following tables. Remember that in the previous example, Customer_Orig and Customer_New started off being identical. I’ve skipped ahead to the point where the following changes were made to Customer_Orig in preparation for the Merge demo.

  1. The Darth Vader record was added to Customer_Orig.
  2. Yoda’s name was changed.
  3. Anakin Skywalker was deleted from Customer_Orig. I know that in the following screenshot I show the record in the table. I included it, and formatted it to suggest that it was once there but it has been deleted.The effect will be to delete Anakin Skywalker from Customer_New.

Merge Changes To Base
Customer New

So, an appropriate Merge statement for these tables can be taken from the previous article as well. However, this time we will add an Output clause in its most basic form. I’ve used the T-SQL comment marks to separate the new section and help it stand out.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum
                    (SELECT Source.CustomerName, Source.Planet
                     SELECT Target.CustomerName, Target.Planet)
      Target.CustomerName = Source.CustomerName
     ,Target.Planet = Source.Planet
   INSERT (CustomerNum, CustomerName, Planet)
   VALUES (CustomerNum, Source.CustomerName, Source.Planet)
OUTPUT $action, inserted.*, deleted.*

When that code is run, you’ll receive the following:

The results table may look confusing, but it’ll make sense in a minute. First, the word OUTPUT is essentially a substitute for SELECT. Second, the $action variable obviously indicates the type of action performed on that row. The actions are Insert, Update, or Delete. If you’ve ever used Triggers you’ll know that they work the same way. When a record is modified two temporary tables are created for Inserted and Deleted values per record. If a record is updated, then it has a record in both of those tables.

Knowing this, you can interpret the results and see that our merge statement was effective and did make Customer_New identical to Customer_Orig. So, if you re-run the select statements from above, you see the following:
Result of Merge

Keep in mind that you don’t have to simply Output the values to the screen. You could insert those records into a physical table, temp table, or table variable as well. Next, let’s insert them into another Temp Table. Re-run the setup code above with the following modifications.

IF OBJECT_ID( 'tempdb..#CustomerChanges') IS NOT NULL DROP TABLE #CustomerChanges;
CREATE TABLE #CustomerChanges(
  ChangeType         NVARCHAR(10)
 ,CustomerNum        TINYINT NOT NULL
 ,NewCustomerName    VARCHAR(25) NULL
 ,PrevCustomerName   VARCHAR(25) NULL
 ,NewPlanet          VARCHAR(25) NULL
 ,PrevPlanet         VARCHAR(25) NULL
 ,UserName           NVARCHAR(100) NOT NULL
 ,DateTimeChanged    DateTime NOT NULL);
MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum
                    (SELECT Source.CustomerName, Source.Planet
                     SELECT Target.CustomerName, Target.Planet)
      Target.CustomerName = Source.CustomerName,
      Target.Planet = Source.Planet
   INSERT (CustomerNum, CustomerName, Planet)
   VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet)
   $ACTION ChangeType,
   coalesce (inserted.CustomerNum, deleted.CustomerNum) CustomerNum,
   inserted.CustomerName NewCustomerName,
   deleted.CustomerName PrevCustomerName,
   inserted.Planet NewPlanet,
   deleted.Planet PrevPlanet,
   SUSER_SNAME() UserName,
   Getdate () DateTimeChanged
    INTO #CustomerChanges
SELECT * FROM #CustomerChanges;

Output Refined
I added the last two columns because I often use the Output clause with Merge (and other DML statements) for auditing purposes.

Now that we’ve covered the basics of Merge and the Output clause, our next article will cover how to use both to process Slowly Changing Dimensions.

Writing T-SQL Merge Statements the Right Way

In a previous article, I discussed Merge statement basics. However, in extensive testing I’ve come to realize that my article, like most articles I’ve read about Merge leaves out or mis-handles several important aspects. Rather than edit that article, I’ve decided to publish a series of articles which I hope will clear up some of these misconceptions. If you already read the original article, a good portion of this will be review as I’m using the original as a base for this one.

Let’s start by glancing at the syntax portion of the Books Online T-SQL Merge Page. I’ll take the liberty of re-posting just the first 25% or so below.

[ WITH [,…n] ]
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] [ WITH ( ) ] [ [ AS ] table_alias ]
THEN ] [ …n ]
THEN ] [ …n ]
[ ]
[ OPTION ( [ ,…n ] ) ]

[ database_name . schema_name . | schema_name . ]

{ [
[ ,…n ] ]
[ [ , ] INDEX ( index_val [ ,…n ] ) ] }

Simple right? Great, I guess I’m done here…. No seriously, who can easily absorb that? So, what is Merge really and how do we use it?

T-SQL Merge Basics

In a nutshell, the Merge statement allows you to Insert, Update, or Delete data in an entity, referred to as the Target, with data from another entity called the Source. The entities are compared on Fields which uniquely identify records in each, a Join if you will. Notice how I keep using the word entity rather than table, and the reason is that the Target and Source could be many SQL Server objects such as Tables, Temp Tables, Views, Table Variables, or even Common Table Expressions. The Source could also be a complete Select statement as well. In this case, for the sake of simplicity, I’ll use Temp Tables.

I think most people learn best from examples, by doing rather than reading descriptions of syntax, so I’ve provided a brief script to create the tables required for the following example.

IF OBJECT_ID ('tempdb..#Customer_Orig') IS NOT NULL DROP TABLE #Customer_Orig;
IF OBJECT_ID ('tempdb..#Customer_New')  IS NOT NULL DROP TABLE #Customer_New;
CREATE TABLE #Customer_Orig
(  CustomerNum    TINYINT NOT NULL
  ,CustomerName   VARCHAR (25) NULL
  ,Planet         VARCHAR (25) NULL);
CREATE TABLE #Customer_New
(  CustomerNum    TINYINT NOT NULL
  ,CustomerName   VARCHAR (25) NULL
  ,Planet         VARCHAR (25) NULL);
INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet)
   VALUES (1, 'Anakin Skywalker', 'Tatooine')
         ,(2, 'Yoda', 'Coruscant')
         ,(3, 'Obi-Wan Kenobi', 'Coruscant');
INSERT INTO #Customer_New (CustomerNum, CustomerName, Planet)
   VALUES (1, 'Anakin Skywalker', 'Tatooine')
         ,(2, 'Yoda', 'Coruscant')
         ,(3, 'Obi-Wan Kenobi', 'Coruscant');

So, I’ve created two temporary tables called Customer_New and Customer_Orig with identical data in each. In this case, Customer_New will be the Target and Customer_Orig will be the Source.
Merge Base
Now, we’re going to make the following changes to Customer_Orig. Notice that I’ve performed a single Insert, Update, and Delete.

-- Update Yoda's Name
UPDATE #Customer_Orig
   SET CustomerName = 'Master Yoda'
 WHERE CustomerNum = 2
-- Delete Anakin
DELETE #Customer_Orig
 WHERE CustomerNum = 1
--Add Darth
INSERT INTO #Customer_Orig (CustomerNum, CustomerName, Planet)
VALUES (4, 'Darth Vader', 'Death Star')

Merge Changes To Base

Now, being the geek that I am, I realize that Anakin became Darth Vader, which could have been seen as a change in name. However, Obi-Wan clearly states that Darth Vader betrayed and murdered Anakin, effectively becoming a new person. If that bothers you, then you’re a scruffy looking nerf herder.

Old School CRUD

Prior to SQL Server 2008, we could have accomplished this merge with the following code. Please note that I have used Joins in my example, which will align with my Merge code later.

--Process Updates  
Update Tgt
Set    Tgt.CustomerName = Src.CustomerName, Tgt.Planet = Src.Planet
FROM   #Customer_New Tgt Inner JOIN #Customer_Orig Src ON Tgt.CustomerNum = Src.CustomerNum
Where  Tgt.CustomerName <> Src.CustomerName Or Tgt.Planet <> Src.Planet -- Eliminates needless updates.
--Process Inserts
Insert Into #Customer_New
  SELECT Src.CustomerNum, Src.CustomerName, Src.Planet
  FROM   #Customer_Orig Src LEFT JOIN #Customer_New Tgt ON Tgt.CustomerNum = Src.CustomerNum
  Where  Tgt.CustomerNum is null;
--Process Deletes
Delete FROM Tgt
from        #Customer_New as Tgt LEFT JOIN #Customer_Orig Src ON Tgt.CustomerNum = Src.CustomerNum
Where       Src.CustomerNum is null;

This works, but it’s less than optimal for a few reasons. First, writing those statements can be tedious, especially if this been a typical table with 20+ fields to deal with. Second, this represents three separate groups of work and SQL Server processes them that way. Once you understand it, the T-SQL Merge Statement is easier to write and can accomplish this in one transaction. It’s basically a win-win.

Components of Merge Statements

So, lets break a Merge statement into it’s component parts to make it easy to understand. First, the Target and Source tables are specified along with the business key which identifies each record. This is the field that one would use in a join.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum

The When Matched clause determines what will happen when records exist in the Source and Target with the same CustomerNum. Notice the additional conditions I’ve added which limits the updates only to records where a value has changed. Strictly speaking, this isn’t required, but every record in the target would be updated regardless of need, which wastes resources.

                AND (Target.CustomerName <> Source.CustomerName 
                  OR Target.Planet <> Source.Planet)
  UPDATE SET --Updates Yoda's Name
      Target.CustomerName = Source.CustomerName,
      Target.Planet = Source.Planet

The When Not Matched by Target clause specifies what should be done with records in the Source that aren’t in the Target. The typical scenario is to insert records which are new. I could have added additional conditions or only added certain new records as well.

   INSERT (CustomerNum, CustomerName, Planet) -- Inserts Darth
   VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet)

The When Not Matched by Source clause specifies what should be done with records in the Target that aren’t in the Source. Keep in mind that if this was a Staging table which wasn’t comprehensive, perhaps the result of an incremental extraction, then you’d want to omit this portion of the statement.


Also, keep in mind that any Merge statement must be terminated in a semicolon. So, when you put your script together, it looks like the following. Go ahead and run it on your test data.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum
                AND (Target.CustomerName <> Source.CustomerName
                     OR Target.Planet <> Source.Planet)
      Target.CustomerName = Source.CustomerName
     ,Target.Planet = Source.Planet
   INSERT (CustomerNum, CustomerName, Planet)
   VALUES (Source.CustomerNum, Source.CustomerName, Source.Planet)

Result of Merge

What about Null Values?

Ah, very astute of you to notice that my young padawan. This is the first mistake that many people make. You’ll notice in the When Matched portion above, that I also check to see if a value changed before I run my update statement. However, depending on your settings, Null values are not equal to each other. Therefore, any record with a Null value will not be updated if one of them is NULL. A great explanation of NULL handling can be found here. First, let’s set up a NULL field comparison issue. Run the following update statement to the Customer_Orig table.

Update CO set Planet = NULL
  FROM #Customer_Orig CO
 WHERE CustomerNum = 3 -- Obi-Wan Kenobi

If you’ll re-run the original merge statement, and compare the tables, you’ll find the following:
Result of Merge Non NULL Handling
Notice that Obi-Wan’s record in Customer_New has not been updated. There are a few ways to deal with this which I’ve seen online. The first two methods each have their problems, which I’ll briefly explain. The third method is the one I recommend if you’d like to “skip ahead” using Select and Except.

Adding “Is NULL” Statements
We could re-write the When Matched clause like this:

                AND (   Target.CustomerName <> Source.CustomerName
                     OR Target.CustomerName IS NULL
                     OR Source.CustomerName IS NULL
                     OR Target.Planet <> Source.Planet
                     OR Target.Planet IS NULL
                     OR Source.Planet IS NULL)

Technically this method works in that it will update if one of your values is NULL. However, its really cumbersome to write, imagine writing a statement like that for 50 columns. Also, the resulting query plan is complex. Perhaps the biggest problem is that if both of your values are NULL, then an update will happen anyway. Go ahead and substitute the code above and re-run the merge statement where one of the values is NULL. You should receive the message “1 Row Effected.” However, if you repeatedly run the merge statement you will ALWAYS receive “1 Row Effected,” again depending on your database settings. For these reasons, this method isn’t recommended.

Using ISNULL or Coalesce
Coalesce can also be used because it chooses the best datatype between arguments based on datatype precedence. Therefore, we could use the same basic Coalesce statement for most comparisons without throwing an error and the When Matched portion from above becomes the following:

                AND (COALESCE(TARGET.CustomerName, '') <> COALESCE(SOURCE.CustomerName, '') 
                  OR COALESCE(TARGET.Planet, '') <> COALESCE(SOURCE.Planet, ''))

This works as well, with some caveats. First, the second value in each Coalesce statement has to be something that should NEVER occur in that column, therefore an empty string ” is most likely not a good choice. Second, notice how I emphasized that coalesce would automatically cast most comparisons, well there are some data types that will fail and require you to use something other than a default string value. The nice thing about this method is that the T-SQL code is short and easy to understand.

Using Select and Except to Handle Nulls(Recommended)

I used the Coalesce method successfully for some time until I came across this excellent article by Paul White. In it he correctly points out that when using Except or Intersect Null values are considered equal. Therefore, when comparing two fields with NULL values, they will be equivalent, and when comparing a NULL value to some other actual value they will be evaluated as Not Equal. He seems to favor using “Not Exists” and Intersect. However, using Exists and Except just makes more sense to my brain. Therefore, I’d write the query like the following. If you take this code and run it on the original tables, as well as the NULL test,you’ll see that it works perfectly.

MERGE  #Customer_New AS Target
 USING #Customer_Orig AS Source
    ON Target.CustomerNum = Source.CustomerNum
                    (SELECT Source.CustomerName, Source.Planet
                     SELECT Target.CustomerName, Target.Planet)
      Target.CustomerName = Source.CustomerName
     ,Target.Planet = Source.Planet
   INSERT (CustomerNum, CustomerName, Planet)
   VALUES (CustomerNum, Source.CustomerName, Source.Planet)

Limitations of Merge?

Of course, there are always limitations. The most important limitation is that both data sources should be on the same SQL Server instance. I’ve seen people use Merge with linked servers, but I wouldn’t recommend it.

Another factor that might give you some pause is that these statements are fairly complex and wordy. A long merge statement, such as merging two tables with 25 fields each, is tedious to write and it’s very easy to make a simple mistake. Well, stay tuned because later in this series I’ll share some code which will practically write the statements for you.

In the next article in this series we’ll discuss how to use the Merge statement with the Output clause as it’s required to load Slowly Changing Dimensions.

Changes in Priorities and Direction

Buy In

Everyone who knows me knows that I absolutely love Data Warehousing. I’m one of those fortunate few who finds their work fascinating, and I tend to dive head first into new projects.

I was recruited to join a team on a multi-terabyte data warehousing project late last summer. I was leery to take the position because project had already been attempted by numerous other teams, which had failed, and subsequently no longer worked for the company. However, the Architect in charge was very persuasive and the lure of a ground breaking project, using some of the best toys out there, was something that I couldn’t resist. I knew going in that this might be a short term situation, but I calculated that the benefits outweighed the negatives.

After I started, it became obvious that the team and project was in a state of flux. The Architect, who I previously mentioned, left shortly after I started, leaving the team to push forward anyway. The executive in charge of the project was fond of saying, “Teams pick their own captains,” and that’s what happened here as I was shortly placed in charge of the project as the new Architect.

I was running a team of a dozen in-house data professionals, as well as directing the efforts of world class consultants in SSIS and Dimensional Modeling. Though I already knew these consultants personally, working directly with them were some of the highlights of my career so far. While I’ve already performed almost every individual role in the complete data warehouse life cycle, being in the position of Architect (and project leader) was particularly enlightening since it gave me a deeper appreciation of the entire process.

It was an amazing, rewarding, and exhausting experience. I got the chance to prove my skills to some of the industry’s best; finding solutions to problems that stymied some of the biggest names in our industry. However, it was not all rainbows and butterflies.

Stress and Burnout

In many data warehouse projects, the technical challenges (though formidable) can pale in comparison to the political and social challenges. As many of us have experienced, business decisions aren’t always based on technical reality. I worked 70 to 80 hours a week for more than 6 months to try to fulfill very aggressive insane deadlines; with no end in sight. I’ve always been a bit of a workaholic, but I had completely lost my work/life balance. Being the geek that I am, and all of the learning that was taking place, I didn’t see the situation clearly. It was taking a toll on all of the other aspects of my life. I wasn’t able to sleep, wasn’t eating properly. I fell way behind in my personal and professional correspondence.

Moments of Clarity


My sister, Regan, and I in 2011

My sister became critically ill the third week of March, and I caught the first flight back home to Michigan to be with family. I stayed a week, but the doctors said the crisis would be long term, and they wouldn’t be able to provide a prognosis for weeks or months. I did not want to leave, but I had a multi-million dollar project with a large team that was counting on me in DFW. This incident put a spotlight on what I was doing to myself and my family due to overwork and lack of balance. After a good deal of quiet reflection, I gave two weeks notice to my previous employer near the end of March. Tragically, my sister died early Easter morning. She was only 41, had young children, and her loss devastated my entire family.

Lessons Learned

The death of my sister really put an exclamation point on the whole mess. I’ve learned a lot through this whole ordeal, and I’ll share some of them with you in the future. However, the most important thing I learned was that while career is important, and what we do affects a lot of people, it isn’t as important as your physical and emotional well-being. Family commitments absolutely have to come first. No matter how fascinating the project is, I cannot neglect my personal life for very long. It’s not fair to myself, my family, or my clients.

New Beginnings

It’s always important to focus on what is good in our lives. My career continues to go well. As I was deciding to leave my previous gig, one of my independent consulting clients came to me needing some long term help with their data warehouse project. It’s a fascinating project, terabytes of data, challenging text file imports, and I’m learning C# to boot. This gig should last for several months and has been perfect for me as I transition to the next phase of my career. This good fortune can be directly attributed to my work in the PASS Community. I want to sincerely thank my #SQLFamily for all of their help and support.

More to come in future posts.

Setup and Performance Issues with the Integration Services (SSIS) 2012 Catalog

For those who don’t know, I’m currently the Data Warehouse Architect for a large scale and complex multi-tenant data warehousing project. Fortunately, the project is being created in SQL Server 2012 and I thrilled to be using the new features. Let me preface this blog post by admitting that my experience with using the SSIS 2012 Catalog, SSISDB, is fairly limited, but then again most people are in the same boat since adoption of 2012 has been fairly slow.

However, I’ve run into some problems which aren’t getting a lot of attention yet, but I’m fairly certain they will in the near future. Let’s break them down.

Initial Catalog Setup Issues

I won’t explain step by step how to create the Integration Services Catalog, because others already have. However, you’ll notice from the following screenshot that very few settings are available when the catalog is created.

I wish that Microsoft would upgrade the Create Catalog screen to enable the user to specify SSISDB settings upon creation. For example, right click the SSISDB under Integration Services Catalogs, and select Properties. The following settings can and should be set:

  • Whether to Clean Logs Periodically – Defaults to True
  • Retention Period in Days – Defaults to 365
  • Default Logging Level – Defaults to Basic
  • Maximum Number of Old Versions Per Project – Defaults to 10
  • Whether to Periodically Remove Old Versions – Defaults to True


I wish we could set those at create time. As I’ll discuss in a minute, the default values can be disastrous for many installations.

SSISDB Database Ignores Model Database Settings

For some reason, when you create the SSISDB Catalog, the corresponding SSISDB database ignores the Model database settings. It appears that no matter what my Model database is set to, SSISDB defaults to set file sizes for the Data File and Log, 10% Growth Rate, and Full Recovery Model. I’ve reached out to Microsoft, but haven’t received an answer as to why.

This has become problematic for us because we’ve deliberately set up our Model Databases so that Simple Recovery is used in our Development and QA Environments while Full Recovery is used in our Production environments. I’ve had to set up periodic checking on all of our servers looking for databases in full recovery mode that aren’t getting transaction log backups to avoid runaway T-Log growth issues.

The script I’ve used to re-set the SSISDB database is as follows:


SSISDB Performance Issues

I’ve come to the conclusion that the SSISDB Database will not scale at all. I became painfully aware of this because of a problem we had in a “pseudo” Production environment. Our first nasty surprise was T-Log growth, because the database incorrectly defaulted to Full Recovery mode which I fixed by shrinking the log. Before you mentally criticize me dear reader, please keep in mind this wasn’t an actual Production environment and the data being kept in SSISDB was inconsequential so far.

Another issue became apparent shortly afterward. We were running with the default settings of Basic level logging, 365 Days Retention, and 10 Versions of the project being kept. Our SSIS project is already fairly large with around 50 packages, but it was only being run periodically in this environment approximately once or twice per week. We are not even hitting this thing very hard. However, the SSISDB database size was growing rapidly anyway. In this particular situation, it had grown to 60GB. While I realize that 60GB is not very large, SSISDB performance slowed to a crawl. In any of our environments, when the SSISDB approaches 50GB the performance degrades dramatically. The first sign of this is that the built in reports take longer and longer to return data and eventually seem to stop returning data at all.

Unfortunately this pseudo Production server was set up with very little hard drive space, 100GB. So, I started receiving free space warnings after the SSISDB database reached 60GB.

I did a little research about reducing the size and performance issues with the SSISDB. I lowered the Retention Period to 7 days, set the Default Logging Level to None, and reduced the Maximum Number of Old Project Versions to 1. We don’t need to save old versions of our project as we are using TFS source control. You are using source control right?

Keep in mind that changing your logging level to None is not advisable in most production environments, however as of this time we are using a 3rd party product to monitor our packages so this isn’t an issue for us. I suspect that most of you will use the Basic level, but set the Retention Levels back and run the cleanup job often to avoid performance issues. Anyway I changed my settings using the following script.

EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N'MAX_PROJECT_VERSIONS', @property_value=1
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N'RETENTION_WINDOW', @property_value=7
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N'SERVER_LOGGING_LEVEL', @property_value=0

Then I proceeded to start the “SSIS Server Maintenance Job” automatically generated when you create the catalog. Wow, that was a mistake.

Don't Touch That Button

Don’t Touch That Button

The job ran for 2 hours and blew up the transaction log even though it was in Simple Recovery Model. I then started receiving emails that the server had completely run out of space. When I checked the error log, I found an error indicating that the Log was out of space and a checkpoint record could not be written.

I immediately started researching and found this article about the cleanup job causing user packages to fail. If you read between the lines, you’ll see that the SSISDB is not properly indexed to handle the cleanup job. In my case, I had to drop the SSISDB, recreate it, and apply the script above to avoid similar issues in the future. Once again, this was not a huge problem for me because it wasn’t truly a Production environment and we rely on a third party product for our SSIS logging anyway. Don’t forget that even if you do change your settings, you must also schedule the clean up job to run regularly. If you allow too much history to be accumulated, you may suffer the same fate I did when running it.

My take on all of this is that SSISDB is not ready for prime time. I don’t think I’m being unfair with that statement. I do appreciate the fact that the SSIS Catalog is a great addition to SSIS 2012, but I expect to be able to use it in Production, running packages every 5 or 15 minutes, retaining months worth of history, and for it to perform regardless.

After writing this article I ran across this excellent article on SSIS 2012 Catalog Indexing, and time permitting I will implement his suggestions and report back. Are any of you having similar issues with your SSIS 2012 Catalog (SSISDB)?

Upcoming Presentations This Week

I’m in for a busy week as I have back to back presentations this Wednesday and Thursday nights.

On Wednesday the 20th, the Forth Worth SQL Server User Group has asked me to present one of my favorites. I’m constantly revising this presentation as I encounter as I come across new mistakes to share. So, even if you’d seen it, there will still be new material to shock and amaze you.

Data Warehouse Mistakes You Can’t Afford to Make

Many data professionals understand the basics of Data Warehouse design, including Dimension and Fact Tables, slowly changing Dimensions, and the use of meaningless surrogate keys. However, it isn’t until you’ve created a dimensional model and put it into production, that you realize just how much of an impact seemingly trivial mistakes can make. They can hobble performance, allow inaccuracy, and perhaps worst of all, inhibit adoption and usage of the new system.

Learn how to avoid many common mistakes, from someone who’s made them and then found ways to correct them.

On Thursday the 21nd, I’ll be continuing my Data Warehouse Series for the North Texas SQL Server User Group.

Data Warehouse Series – Basics and Beyond

In a series of 1 hour sessions, I’ll cover both the basics and gotchas of Data Warehouse Development using the SQL Server 2012 toolset. Some of the things we’ll cover are:

  • Basic methodology for evaluating business requirements/needs, obtaining user cooperation, and identifying source data quality issues.
  • Data Warehouse basic concepts including Facts, Dimensions, Star and Snowflake Schemas.
  • Deep dive into slowly changing Dimension types and how to handle them using T-SQL and SSIS.
  • T-SQL Merge Fundamentals and how it can be used to load Dimension and Fact Tables. Yes, I swear the Merge Generation Code will be ready for the presentation. :)
  • Understanding the three basic Fact Table types, how they are used, and loaded with SSIS.
  • Date Dimension creation and configuration.
  • Taking advantage of SSIS 2012 functionality improvements including Environments, Project Parameters etc.
  • Strategies for SSIS and database optimization, insuring data integrity, etc.

And much, much more. Each session will build upon the previous sessions so they will be recorded and posted to my blog. Come watch this exciting and progressive series as I take you from padawan to Jedi Master.

I can’t wait to see my friends at the meetings.

Gratitude, Resolutions, and Shit

“Shit!” I said, followed by even more graphic expletives. I lost you? Sorry, I forget that you don’t follow me around? Let me explain.

Like most people, I have a junk room filled with boxes and plastic storage bins. After Thanksgiving I was in the garage looking for our stored Christmas ornaments and came across a very large bin that was unlabeled. When I opened it, I found a ton of brand new clothes including many pairs of Dockers pants and all of them had the tags still on. To understand my anger we need to back up several years.

Clowning around, but not really happy.

Clowning around, but not really happy.

Around 15 years ago, on my way to morbid obesity, I was outgrowing my clothes much faster than they wore out. During the rise, I bought a bunch of Dockers Slacks and other clothes for work because I got them on sale. However, because my weight rose so fast, I had pairs of them that were never worn. I never even took the tags off. I could have returned them, but I was embarrassed because of the reason. Those of you who know me, know that I’m unfailingly honest. I couldn’t look a clerk in the face and tell them that I had to return the pants because I was too fat to wear them. For these all of these years I have kept them in a plastic bin, telling myself that I would lose weight and wear them again someday.

I carted those pants from apartment to apartment. I had garage sales and didn’t sell those pants. Even as I gave away half my stuff when I packed up everything to move across country years ago, I kept that damn bin. I knew it would be feel great to be able to wear them. However, I lost track of it in my latest move.

So, why the expletives? Well, in case anyone doesn’t know I had weight loss surgery back in April. I figured that as my weight dropped I would be able to use the brand new Dockers (and other clothes) from that bin. However, by the time I found it they were already several sizes too large for me anyway. Sometime in December I passed 100 lbs of weight lost. So, my immediate reaction was anger that I had wasted the money, time, and effort carting around those stupid pants only to never get to wear the damn things.

Then, “Bang” I felt like something hit me upside the head.


Who's this guy? - Photo courtesy of Tim Mitchell

Who’s this guy? – Photo courtesy of Tim Mitchell

What the heck was I thinking? I have so many things for which I should be grateful. I’ve lost a hundred pounds, 10 inches from my waist, and feel great. Weight Loss Surgery saved my life. I would absolutely do it all over again, and again, and again if necessary. I’ve reached a point in my career where money isn’t a problem, who cares if I had to donate a couple of hundred dollars in clothes to Goodwill? I’ve already donated thousands of dollars of clothes as I’ve gone down through the sizes. There are lots of people in my #SQLFamily who care and I’m surrounded by great people at the North Texas SQL Server User Group as well.

I have been working for the past several months for a company in Carrollton, Texas on a fascination multi-tenant data warehouse project as the Data Warehouse Architect. The project is amazing, we’re using all of the best toys (tools), and I’ve brought in some of the very best consultants to assist with it. I have my SQL Family to thank for that. More on that in a later article.

Anyway, life is too short to dwell on meaningless things like Dockers pants. Life is good.


This is the first year that my primary resolution is NOT to lose weight, though I intend lose another 20 to 40 pounds this year. I’m back in the gym and have been investigating Aikido Dojos in the area. I have Steve Jones (Blog/Twitter) to thank for pushing me back toward the art. I can tell you that if he likes you, the man is relentless. So, I promise to be attending one before the end of January.

I want to start getting certified in the BI Stack. I’m calling myself out right now in front of all (both) of my readers. I will get at least 3 certifications this year and will shoot for more.

Regardless of my difficult work schedule, I’ll take more time to blog and present this year. In fact, I’ve just agreed to present an ongoing series at the North Texas SQL Server User Group meetings called, “Data Warehouse Series – Basics and Beyond” and I hope to see some of you there. I love giving back to my #SQLFamily and need to do as much as I can this year. Mentoring others in our profession makes me feel great.

Having shed almost a full person, I have more energy than ever. Watch out folks, you ain’t seen nothing yet.

PASS Summit and My SQL Family

Those who read this blog know that I’ve always been into the PASS Community, particularly about attending the PASS Summit. This is my third and I hope never to miss one.

I’ve written articles about why it’s worth paying your own way, how going is like finding a golden ticket and even like going on vacation. Many people, including my “real” family don’t really understand why I love my SQL Family (#sqlfamily) so much.

Changes and The Future

I’m particularly geeked about this one because so much has changed in the past year. I’ve lost almost 100 pounds and feel so much better now. Also I’ve switched jobs and am now working as a corporate employee on a huge data warehousing project which is unlike anything I’ve ever seen before. Last night I went to dinner with most of the authors of SQL Server 2012 Integration Design Patterns, shared a few tidbits of the project, and they agreed that it was something special. More on the book and my current project to come in a future article.

A Scare Brings Focus

Anyway, a few recent things really brought me back to why I am so committed to this community. I thought that I had lost the blog a few weeks ago. One of my readers notified me that my site was down and I had no idea how long it had been down. My provider did not respond immediately to my email and when they did the response was something like, “It’s down? Oh, I’d better check that out.”

Because it’s a hosted site, my backup/restore options are limited, and I had never performed a disaster recovery test of the blog. To be honest, I wouldn’t know how to. I was surprised at my visceral reaction when I thought I had lost it, even though I’ve been neglecting the blog for the past year. It turned out that the problem wasn’t data loss, but rather a Denial of Service Attack which my provider was unaware of. Anyway, everything turned out alright, but it reminded me of how important the blog and SQL Community are to me and I intend to make it a priority in the future.

My Calling

Another reminder came after the latest SQL Saturday for the North Texas SQL Server User Group. For those who don’t know, I recently gave a precon for SQL Saturday Dallas. It went extremely well because I really love presenting, and I eagerly read all of the Session Evaluations I received. I don’t think that your average attendee realizes how much we care about what they have to say and I love pouring over the comments to find ways to improve. One attendee said something which really touched me and made me re-focus on my community involvement.

“It can be easily seen that David has found his calling in life. He is incredibly knowledgeable and passionate about Data Warehousing. I only wish we had more time for a deeper dive on some topics…”

Well, this person really nailed it. I have found my calling and love presenting, blogging, and volunteering in the PASS Community. I am fascinated by Data Warehousing, Integration, and Analytics. There’s something special about sharing knowledge with your fellow database pros (and friends) which feels so good that it’s worth all of the effort.

Last night at dinner, Tim Mitchell (Blog/Twitter) summed up the feeling very well. He’s a rabid Rangers baseball fan and therefore used a baseball metaphor. I’m paraphrasing but he likened his precon experience to the feeling when a hush comes over a crowd, the batter makes solid contact with the ball, the “crack of the bat” echos through the stadium, and it’s a home run. You just know when you hit it out of the park, and it’s one of the best feelings ever.

When I’m at the top of my game, this is exactly how it feels. When you are connected to the audience, you can feel that they are learning, are entertained, and most importantly you’ve made a difference. You know that they’ll take what you’ve taught and go back to their jobs to make things better.

Join the SQL Family #SQLFamily

Well, I hope you’ll forgive my rambling. It’s 4am as I write this, and I don’t adjust well to timezone changes. The official conference starts tomorrow and I can barely contain my excitement. It’s great seeing all of my friends again, some of whom I haven’t seen since last year. If you see me out and about, and manage to recognize me since I’ve lost almost 100 lbs, please stop me even if we’ve never met before.

I’ve already met some folks who I’m sure will become part of my #SQLFamily, and am always looking to connect with more.

If you’re not here, then make plans for next year. You cannot beat the networking and educational experience you get at a PASS Summit.

SQL Saturday Dallas Reflections and Presentation Downloads

Every time I attend SQL Saturdays I am re-energized, and SQL Saturday 163 BI Edition was no different. This one was particularly special because of my precon presentation, “Data Warehousing In a Day”. I just love getting up in front of a group and teaching and it’s obvious when to the attendees.

If all of the emails I’ve received are any indication, the attendees enjoyed it as much as I did. I want to thank those who attended the precons (not just mine) and attended SQL Saturday. You folks make the PASS Community worthwhile.

I also want to thank our team of volunteers who put on the event.

As anyone who reads my blog knows, my time has been ridiculously scarce lately and I wasn’t able to do much more than get my presentations ready. It’s nice to know that if I can’t give my usual effort that everyone else picks up the slack. Everything went off great. Congrats guys.

Presenting “Dimension – It’s All Meaningless Without Them”

Anyway, as promised the following are available for download:

Source Database Backup, Scripts, and SSIS 2012 Packages
Building an ETL Automation Framework for SSIS 2008 – Rushabh Mehta

Once again folks, please accept my sincere apologies that I haven’t made enough time for my community work, especially blogging. I just recently severed ties with a project that was taking all my time and preventing me from blogging. That, and another event I’ll talk about soon, reminded me how important the SQL Community is to me. I will endeavor to make this the last time I need to apologize.

Anyway, coming up in future posts I’ll cover the following:

  • More on the Merge Statement including a wrinkle I ran into last week regarding foreign keys.
  • The Balanced Data Distributor Transform for SISS.
  • A series of articles regarding Dimensions, SCDs, and how to load them in SSIS 2008 through SSIS 2012.

Stay Tuned!

Page 4 of 32« First...23456...102030...Last »