Thank You to My NTSSUG SQL Family

Dave Ryan Vic
I’m happy to announce that I’ve just been elected to the North Texas SQL Server User Group Board of Directors. I owe so much to the PASS Community in general, but even more to our local group. In case you aren’t aware, the NTSSUG is one of the most active and successful groups in the entire country, and I’m very honored to join the board.

I want to just take a moment to acknowledge what our group has already achieved.

  • We’ve had five successful SQL Saturday events. Our last, SQL Saturday #255 had more than 500 attendees.
  • Hosted SQL Rally in Dallas.
  • We routinely draw 60 to 80 participants to our monthly meetings.

I’d like to take a moment to thank the two people who also ran for the board this round, Miyaka Fusi (Twitter) and Chris Utter. Being a board member is a serious commitment, and I hope to continue to see you folks at meetings and events.

Also, I’d like to take a moment to thank our outgoing two board members as well. Erin Welker (Blog/Twitter) will be retiring as our Past President and she’s served since the group’s inception more than 10 years ago. Tim Mitchell (Blog/Twitter) has been our Director of Finance for the past four years and his contributions have been vital to the growth of the group.

The current board is a very cohesive team made up of talented and dedicated professionals. Great guys one and all.

  1. Sri Sridharan (Blog/Twitter) – President
  2. Ryan Adams (Blog/Twitter) – Director of Programs
  3. Ganesh Gopalakrishnan (Twitter) – Director of Logistics and Communication

I officially become a board member on January 1st, and I promise to do my very best to measure up to the standards set by my fellow board members as well as those that served before me.

Wish me luck!

Biml Makes You THE ONE

The Matrix is one of my favorite movies. The story, special affects, and the theme are so compelling that I’ve watched the movie at least a dozen times. I referenced it in the introduction for my SQL Saturday 255 presentation, Biml – The Next Step in Agile Data Warehousing, and it fits in perfectly with my ongoing series, Why I Choose Biml Scripting as well. How does Biml relate to the Matrix?

The Story

What is the Matrix really about? It’s about this really smart guy, who’s disillusioned with his surroundings and life. Everything seems flat and boring, and he can sense that there’s something fundamental being hidden from him. He goes to work everyday, doing the same things, but is spending his nights on his computer searching for the secrets of The Matrix.
This is a person who seeks more. He seeks to gain a true understanding of his environment, his role in it, and eventually to gain mastery of it and his fate. After enduring the process of searching for a guide, learning what the Matrix is, and lots of Kung Fu and gun battles, he has an epiphany.

Neo Sees the Source Code

Neo is The One

Neo is The One

He realizes that he is in fact “The One.” He can see the underlying code of The Matrix and can use his gift to make his world into what he wants it to be. He doesn’t have to toil in tedium for the rest of his life.

Can I Learn to Dodge Bullets?

Since Biml compiles into SQL Server Objects like views, tables, SSIS packages, and SSAS cubes, you can learn to automate their creation and functionality with BimlScript. Essentially, you gain the ability to “see the code,” and all of the resulting power that conveys. As I demonstrated on Saturday, you can create BimlScripts which will automatically create and load a data warehouse based upon metadata.
Can you learn to dodge or stop bullets? Well, no. However, if you spend time mastering BimlScript, you can build and load a data warehouse nearly as fast as you can draw the boxes and lines. You can save hundreds of hours of tedium, deliver data warehouses in rapid iterations, and save companies untold thousands of dollars.

Will You Take the Blue or Red Pill?

So, will you take the Blue pill and go on about your everyday life? Are you content to build and repeatedly edit the same old SSIS packages again and again? Or will you take the Red Pill and learn BimlScript instead? Will you learn how to automate your data warehouse loads, create new SSIS Packages, SSAS Cubes, and other SQL Server objects at will? The choice is yours.

As I closed my Biml presentation, I mentioned that I would be posting the BimlScripts I wrote for it on the blog for download. I’m also going to include articles detailing how the code works and how to write similar code yourself. Stay tuned in the coming days and weeks for those articles. Also, I am working on scheduling some full-day BimlScript Workshops. I’m currently trying to find a good venue in Dallas/Fort Worth, and may have already secured one in Austin, TX. Finding the right venues will be critical as I’d like to make the training as affordable as possible. If you would like to attend a workshop, know of a good venue, etc. feel free to contact me at

Continue to stay tuned to the blog. I’ll be posting more Biml articles, announcing workshops, and other cool stuff in the coming weeks.

Upcoming Presentations at NTSSUG and SQL Saturday Dallas

Just wanted to give my readers, both of you, a quick heads up on my upcoming presentations, both of which deal with BimlScripting.

“Biml Basics” Tuesday Night at NTSSUG

Tuesday night (10/29) I’m presenting the final session (for now) of my Data Warehousing Basics and Beyond series for the North Texas SQL Server User Group. We’ve covered the basics on how to create both Dimension and Fact Tables, as well as how to load them with SSIS. This final presentation will be on Biml Basics. We’ll discuss what Biml is, the basics of how to extend it with script, and why you should adopt it. I think everyone will find it interesting and this session will serve as a good pre-lude to my upcoming presentation at SQL Saturday 255.

SQL Saturday #255 – Arlington, TX

Our next SQL Saturday is this week (11/2/2013) and I’ll be volunteering as usual. The regular event is free to everyone and we still have room for more attendees, so I urge everyone in the area to attend. I’ll be presenting the following at the event:

Biml – The Next Step in Agile Data Warehousing

Business Intelligence Markup Language is literally changing the way we think about Data Warehouse design and ETL. In this interactive, demo focused presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use Biml Script to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.

Also, we are offering three pre-conference sessions on Friday (11/1) from well known experts in their field, which you umm… may have heard of. 🙂 They are an excellent value at $120 for a full day of training. The speakers and session titles are as follows:

A Day of SSIS 2012 – Andy Leonard
Query Perfomance Tuning – Grant Fritchey
Data Science Crash Course- Drew Minkin

More details for the SQL Saturday event and Precons can be found here. I hope to see a lot of you there!

Biml - Because I'm a Cheater

I’ll admit it, I’m a cheater. Now, before you click away in disgust, give me a minute to explain. Almost 15 years ago, when I felt stuck in my career as “The Made2Manage Guy,” I’d routinely run out of things to do. I mean, once you’ve learned all the pitfalls and gained a mastery of the program and job, what more was there to do? I can almost hear you saying, “Umm… you could learn something else and grow.” Well I hear ya, and nowadays I can’t imagine wasting idle time like that. However, this was a long time ago, and like so many people in the early days of the internet, I found entertainment on the interwebs.

Don’t Hate the Player

word whomp
I found which was and is a free online games site that caters to the soccer moms everywhere. I was pretty good at one game in particular, Word Whomp, which is a fun vocabulary based game. It’s kind of a cross between Whac a Mole and Scrabble. For each new round, a string of “random” letters is presented along with clusters of blank tiles representing all of the English words one can make from those letters.

Anyway, as I said, I was pretty good at it, but there were people on the leader boards who were inhumanly good at it. Well, I’m a naturally competitive person and was rapidly becoming frustrated and with the repetitiveness of the game and my inability to compete with the leaders. So, what did I do? I cheated. Before I describe how, let’s just have a disclaimer here. While I was cheating, there was no payoff for it other than personal satisfaction. I consider it “harmless cheating.”

Cheaters Never Win?

I fired up Visual Basic, I forget which version now, and created a program which was able to screen scrape Word Whomp for that random letter set. It then scanned through its own local Access Database (Lame… I know) for applicable words which could be made. Each viable word was then submitted. At the end of every round, Word Whomp helpfully lists all the words you missed, and my program scraped those too then added them to the database. Since my database started “empty” it took a bit to gain momentum. However, the longest words are only six characters, so as time went on, my program got more and more efficient at its task and rarely missed a word. Needless to say, it wasn’t long before my program was near the top of the leader boards. However, as time went on, it also became obvious that I wasn’t the only cheater either as I never did end up in first place.

What does this have to do with Biml?

A couple of things occurred to me as I finished the cheating program. I had a lot more fun writing the program to cheat at Word Whomp than actually playing the game itself. I get bored when performing repetitive tasks, even if they’re supposed to be fun. This is one reason why I write so much code that writes other code. I’ve never understood why some folks are content to re-write the same basic code repeatedly.

As I’ve mentioned on the blog and in my recent videos, I adopted Biml Script for a recent project and saved enormous amounts of development time as a result. Granted, there is a learning curve to gain an understanding of Biml Script and deciding how you will leverage it. However, as soon as I passed that stage, my productivity just took off. Although I am used to working myself out of a job, even I wasn’t prepared for just how quickly I finished my last project. Working on my Biml Script framework, is a lot like the Word Whom cheating program. It’s fun and liberating to design new frameworks and methodologies to solve age old problems.

Incidentally, shortly after I finished my program, I grew bored again watching it play Word Whomp. This was about the time that I decided to work my way out of that boring job and started reaching out to members of the PASS Community like Brent Ozar (Blog/Twitter). Since then I’ve put my energy, ingenuity, and drive into becoming a world class Data Warehouse Architect and am now “cheating” with Biml. I urge you to do the same.

HELLO to the WORLD of Biml

A “Hello World” program is an ages old traditional way to teach the basics of syntax for a programming language to a beginner. Since so many of us are just starting to use Biml Script, I thought it would be fun to start out with it. Also, I’ve made the entire solution available for download.


Before we begin, download and install BidsHelper from Then open Business Intelligence Development Studio (BIDS) or SQL Server Data Tools (SSDT) and create a new project. We’ll call ours “HelloWorld”.
Right click on your project name and select “Add New Biml File”.
Open the Biml file and you’ll find this:

<Biml xmlns="">

This is an XML declaration that specifies the XML Schema Definition (XSD). The XSD file defines “the rules” for any XML document created and validates them against those rules. Now, before we start getting into Biml code, I want everyone to read about the Copy and Paste Issues in Visual Studio. Otherwise, any code you copy from my blog (or really any source) and then paste into Visual Studio will not work.

Create a Basic SSIS Package

Since the whole point of “Hello World” is to create the most basic example in a programming language that produces output, the simplest step we can take is to create a package. So, let’s add the bare minimum necessary to create a package called “Hello World”. Begin by hitting enter at the end of the first line and typing “<". Intellisense should kick in with a list of options. You can either continue typing "Packages" or select it from the list. hello_world_intellisense
As soon as you close the Packages Node, you’ll notice that the closed Packages tag appears and the cursor is placed between those tags. Type “<" again and you'll notice that Intellisense offers fewer options this time. Either type or choose "Package" (singular) and then type ">” to close that as well. You should see that your code has a blue squiggly line below “Package” which indicates that something is missing. If you hover over that, you’ll see the following:

Our code is missing a Constraint Mode (and also a Package Name) which is required. So, if we edit our code to add a Name of “Hello World” to our package and then type “ConstraintMode” the Intellisense offers the two options of Linear and Parallel. Linear will cause each Control Flow Task to be run in order, while Parallel allows them to be run concurrently and also allows you to specify the precedent constraints individually per task. For our simple example, just select Linear. Your code should now look like this:

<Biml xmlns="">
		<Package Name="Hello World" ConstraintMode="Linear">

Right click on your Biml Script file, and select “Generate SSIS Packages.” A generic package named “Hello World.dtsx” should be created. If you attempt to open it, you’ll see that the package is empty.

Creating Tasks

So, we’ve achieved the basic premise of a “Hello World” application, but lets dig further. Let’s add the Biml code for a couple of Tasks. First, we’ll add an Execute SQL Task. Now, if you think about a simple Execute SQL Task, what elements would be required?

  • Task Name
  • Connection
  • SQL Statement if using direct input.

So, we need to have a Connection. A Connection requires a name and a connection string, but it also requires an existing database. I’ve created an empty database called “HelloWorldDemo” on my instance. There are a number of ways to generate connection strings, but I prefer the following way because it is so simple. For your convenience I’ve recorded this quick video, but you can also follow the instructions below it.

  1. Create a text file somewhere. I’m lazy so I created mine on my desktop and named it Connection Strings.txt. You may name yours whatever you wish.
  2. Make sure that file extensions are not hidden on your system. Highlight the txt extension and replace it with udl. Windows will warn you that changing the extension might cause problems. Select Yes and change it anyway.
  3. Double Click the file to open it. You’ll be presented with the standard Data Link interface that we’re all familiar with.
  4. Configure that screen, but please select “Use Windows NT Integrated security.” You’ll see why in a moment.
  5. Select the Database, in my case it’s called HelloWorldDemo, and click Test Connection. After that succeeds, select OK to close the screen.
  6. Right Click the udl file and select Open With… Notepad (or some similar program). The last line of text will have your connection string.

Return to your Biml Script and create a Node called Connections just below the first line. Then place the code into it like so:

<Biml xmlns="">
		<Connection Name ="HelloWorldDemo" ConnectionString="Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=HelloWorldDemo;Data Source=DAVESDEV\SQL2012" />

Now that we have a connection, we can create Packages and Tasks which use it. Let’s create the Execute SQL Task mentioned above. First, we have to create a Tasks Node. Our Execute SQL Task will connect to our Connection, but won’t actually do anything.

<Biml xmlns="">
		<Connection Name ="HelloWorldDemo" ConnectionString="Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=HelloWorldDemo;Data Source=DAVESDEV\SQL2012" />
		<Package Name="Hello World" ConstraintMode="Linear">
				<ExecuteSQL Name="SQL Hello World" ConnectionName="HelloWorldDemo">
					<DirectInput>Select 'Hello World' as GreetingMessage</DirectInput>

Right click on your Biml Script and click Generate SSIS Packages. This time, since the Hello World.dtsx package already exists, you’ll receive the following window which allows you to selectively over-write packages. Obviously, in this case we want to over-write the package so select Commit. Now, if you open your package you will find a single Execute SQL Task and if you double click to open it, you will find the following:

While the package will open, those of us working in SSIS 2012 will receive an error while running it.

You will learn to loathe this.

You will learn to loathe this.

The default ProtectionLevel for a SSIS project is EncryptSensitiveWithUserKey, which most of the time is not what you want. Basically, any user other than the original creator who attempts to use this package must provide all of the credentials. I prefer to use DontSaveSensitive whenever possible and let Active Directory handle permissions. Right click on the project name and select Properties. Change the ProtectionLevel to DontSaveSensitive. If your project still contains the default Package.dtsx, you will need to open that and change it’s ProtectionLevel setting or delete it to continue.

After changing the project ProtectionLevel, go ahead and run Hello World.dtsx. The SQL Hello World task will turn green, and there is peace in the valley. Next, let’s create some tables in Biml.

Creating Tables

Step 1 – Create Definitions

Before we begin creating tables in Biml and Biml Script, let’s just define what is necessary to create the same tables using T-SQL. We’re going to create two tables so I can demonstrate how Biml Script will iterate through them. The two tables will be called “HelloWorld” and “FromMade2Mentor.” Let’s look at the T-SQL that I would write to define these.

Use HelloWorldDemo;
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'dbo')
    EXEC sp_executesql N'CREATE SCHEMA dbo AUTHORIZATION dbo';    
IF  EXISTS (SELECT 1 FROM sys.tables WHERE SCHEMA_NAME(schema_id) = 'dbo' AND name = 'HelloWorld')
   Drop Table  dbo.HelloWorld;
Create Table dbo.HelloWorld
[HwId] smallint IDENTITY(1, 1) NOT NULL,
[HWChar10Field] varchar(30) NULL
IF  EXISTS (SELECT 1 FROM sys.tables WHERE SCHEMA_NAME(schema_id) = 'dbo' AND name = 'FromMade2Mentor')
   Drop Table  dbo.FromMade2Mentor;
Create Table dbo.FromMade2Mentor
[MentorId] smallint IDENTITY(1, 1) NOT NULL,
[MentorDate] date NULL

While I realize that both of these are in the schema dbo and the create schema statement isn’t really necessary, I’ve left it in for two reasons. First, you’d need a statement like that to verify that a named schema like ‘Staging’ already existed before creating tables in it. Second, it illustrates how Biml script is similar to the above statement.

Creating tables in Biml is fairly easy. The important thing to remember is that you must first define the table(s) in a Biml file, and then use another Biml Script file to build the package(s) which you would then run to create them. The end result is a SSIS package with Execute SQL Tasks which create the necessary tables. So, let’s start with the Biml file which will contain the table definitions and we’ll name it Destination Table Defs.biml. As you can see in the following code the Biml follows the same basic path.

  1. The Connection is declared (which isn’t necessary in SSMS but is implied).
  2. The Database(s) is specified.
  3. The Schema(s) is specified.
  4. The Table Name is specified. (as in Create Table dbo.HelloWorld).
  5. The Columns are named and defined for data type and other attributes such as Nullability and Identity.
<Biml xmlns="">
		<Connection Name ="HelloWorldDemo" ConnectionString="Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=HelloWorldDemo;Data Source=DAVESDEV\SQL2012" />
		<Database Name="HelloWorldDemo" ConnectionName="HelloWorldDemo" ></Database>
		<Schema Name ="dbo" DatabaseName="HelloWorldDemo"/>
		<Table Name="HelloWorld"   SchemaName="HelloWorldDemo.dbo">
				<Column Name="HwId" DataType="Int16" IdentityIncrement="1"></Column>
				<Column Name="HWVarChar30Field" DataType="AnsiString" Length="30" IsNullable="true"></Column>
		<Table Name="FromMade2Mentor"   SchemaName="HelloWorldDemo.dbo">
				<Column Name="MentorId" DataType="Int16" IdentityIncrement="1"></Column>
				<Column Name="MentorDate" DataType="Date" IsNullable="true"></Column>

Now, save your project, right click on Destination Table Defs.biml and select Check Biml for Errors. Assuming that your script passes validation right click again and select Generate SSIS Packages. What happens? Well, nothing. Why? You have provided the table definitions, but haven’t provided the mechanism to actually create them. That is our next step.

Step 2 – Generate Package(s)

Knowing what we learned earlier, we could create a static Biml file with Execute SQL Tasks to create these tables, but that’s not very efficient. Wouldn’t it be great if we could just dynamically create that package and it would create every table we’ve specified? This is your first taste, small though it is, of the power of Biml Script. Let’s go through the steps to create such a package as if we were going to do it in SSIS. We’d most likely:

  1. Create a new package and name it. I’ll name it “Create Destination Tables.dtsx”.
  2. We’d create and specify a Connection.
  3. For each table we needed to create we’d drag out an Execute SQL Task, rename it, specify the connection, and provide the SQL query to use for Direct Input.
  4. Close and save the package.

Of course, this process is tedious. It’s the same set of clicks and drags over and over again. Also, you’d need to open this package and add another Execute SQL Task every time you added a new table to the destination. Let’s create a new Biml File and name it “Create Dest Tables.biml”.Read through the following Biml Script code and you’ll see that it follows along with the process we’d take in SSIS. Paste this script into our new Biml file.

<#@ template language="C#" hostspecific="true" #>
	<Biml xmlns="">
        <Package Name="Create Destination Tables" AutoCreateConfigurationsType="None" ConstraintMode="Linear">
                <# foreach(var table in RootNode.Tables) {#>
                <ExecuteSQL Name="SQL Create <#=table.Schema #> <#=table.Name#>" ConnectionName="<#=table.Connection.Name#>">
                <# } #>

You may notice that Visual Studio shows you red sqiggles with your C# code. That’s because it is expecting XML only so you’ll have to learn to ignore them. Notice how all of the C# code is held inside of the pound signs. The foreach statement loops though every table defined and assigns it to the table variable I’ve declared. The embedded C# such as #=table.Name # is returning the Name property of the Table object. Therefore, I am dynamically naming my Execute SQL Tasks and correctly specifying them for each iteration. GetTableSQL is a function provided by Varigence which does just what you’d think. Now, right click Create Dest Tables.biml and check for errors. You’ll receive the following:
Basically, the problem is that the code is trying to iterate through the tables, but since we only selected the script file and not the definitions file, there were no Tasks created under the Tasks node. If you select both of the files and check for errors, this should be remedied. After passing that check, make sure that both are selected, right click and generate the package. Upon opening the package, you should see the following:

Now, double click SQL Create dbo HelloWorld. After the window opens, click the ellipses to expand the SQL Statement. After expanding the window to show the entire statement, you’ll notice that the resulting SQL query looks almost identical to the one we created above.

If you run the resulting package, the two tables we defined will now exist in the database.

Just for Fun

Typically a “Hello World” program would pop up that message to the user. While we have mimicked that with Packages and Tables “popping up,” I thought it would be fun to show you a little trick to finish this article. I borrowed this with permission from a friend of mine who just recently started blogging over on his new Biml Geek Blog.

Open our Project again, create a new Biml file, and name it. I’ve chosen to name it Hello World Message.biml. Copy and paste the following code into it.

<Biml xmlns="">
<#@ template language="C#" #>
<#@ assembly name="%WINDIR%\\Microsoft.NET\\Framework\\v2.0.50727\\System.Windows.Forms.dll" #>
<#@ import namespace="System.Windows.Forms" #>
<#MessageBox.Show("Hello World");#>

Remember that the scripting in this article is simply C#. Therefore, we can leverage the language to do a lot more than just iterate with For Each loops. So, one last time, right click the Biml file and select “Generate SSIS Packages” and you should see the following:

Keep on Biml-ing!

Data Warehouse Initial (Historical) Dimension Loading With T-SQL Merge

If T-SQL Merge can only handle one changed record at a time, how can we use it for our initial/historical load? Further, how can we use Merge in those situations where the incremental load may produce more than 1 changed record per entity? Well, I’m glad you asked.

Welcome to the next article in the Urge to Merge Series. In this series I’ve tried to clear up many misunderstandings about how to use T-SQL Merge effectively, with a focus on Data Warehousing.

The Setup

We will be using the same code we used in Extracting Historical Dimension Records using T-SQL, which is available here: Jedi Merge Historical Load Setup.

For the sake of simplicity, we will use the first query provided in that article which extracts the entire historical load. The query is as follows:

INSERT INTO Staging.DimCustomer (CustomerNum
   SELECT CustomerNum
         ,Row_Number ()
             OVER (PARTITION BY CustomerNum ORDER BY ChangeDate DESC)
     FROM Source.CustomerHistory
   --Where CustomerNum = 102
   ORDER BY CustomerHistoryId

The most important part of this query is the RankNo column. Notice that I am assigning a Row_Number per entity (Business Key) by descending ChangeDate. This means the oldest records per CustomerNum get the highest RankNo value. My looping Merge is set to loop from highest rank down to 1, which may seem counter-intuitive to some. The reason I do this will become clear in a minute. Anyway, the above query will create a data set like the following in Staging.DimCustomer:

Notice that I've filtered for Yoda only for simplicity.

Notice that I’ve filtered for Yoda only for simplicity.

If you’ve been following along in this series, I’ve repeatedly mentioned a limitation of T-SQL Merge. On each execution of the Merge statement, there will only be 1 record per entity to Merge. Well, we’re about to deal with that issue now. Just because we can only Merge one change record per entity at a time, doesn’t mean we can’t loop through Merge statements to accomplish an Initial (Historical) Dimension Load. The Merge code should look familiar as it builds on my previous articles.

Looping Merge Code

IF OBJECT_ID('tempdb..#DimCustomer') IS NOT NULL
    DROP TABLE #DimCustomer;
CREATE TABLE #DimCustomer(
  CustomerNum int NOT NULL, 
  Planet VARCHAR(25) NOT NULL, 
  RowIsCurrent Char(1) Not NULL Default 'Y',
  RowStartDate datetime Not Null Default getdate(), 
  RowEndDate datetime Not NUll Default '12/31/9999');   
DECLARE @Counter SmallInt
SET @Counter = (SELECT MAX(RankNo) FROM Staging.DimCustomer);
WHILE (@Counter >=  1)
-- Type 2 Loop
INSERT INTO #DimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate)
Select CustomerNum, CustomerName, Planet,'Y', RowStartDate, '12/31/9999' From 
(Merge dbo.DimCustomer AS Target
 USING ( Select CustomerNum, CustomerName, Planet, ChangeDate, RankNo  From    Staging.DimCustomer
 Where RankNo = @Counter) as Source 
ON  Target.CustomerNum = Source.CustomerNum
 AND Target.RowIsCurrent = 'Y' 
 and EXISTS (SELECT Source.CustomerNum, Source.Planet
 EXCEPT SELECT Target.CustomerNum, Target.Planet) 
 THEN UPDATE SET RowIsCurrent = 'N',Target.RowEndDate = Source.ChangeDate
 WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate) 
Values (Source.CustomerNum, Source.CustomerName, Source.Planet,'Y',Source.ChangeDate ,'12/31/9999')
Output $ACTION ActionOut ,Source.CustomerNum, Source.CustomerName, Source.Planet, Source.ChangeDate RowStartDate
              ) AS MergeOut
    WHERE MergeOut.ActionOut = 'UPDATE';
Insert  dbo.DimCustomer
Select CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate  from #DimCustomer;
	SET @Counter = @Counter - 1;
-- Type 1
Select @Counter Counter
 MERGE  dbo.DimCustomer AS Target
 USING ( Select CustomerNum, CustomerName, Planet, ChangeDate, RankNo  From    Staging.DimCustomer
 Where RankNo = 1) AS Source
    ON Target.CustomerNum = Source.CustomerNum
                    (SELECT Source.CustomerNum, Source.CustomerName
                     SELECT Target.CustomerNum, Target.CustomerName)
   UPDATE SET Target.CustomerName = Source.CustomerName;

There are just a few things to note:

  1. As I mentioned in a previous article, the Type 1 Merge statement is performed last. Obviously, it has to be.
  2. Notice that I’m using a simple While statement to loop.
  3. The RankNo column was calculated by descending ChangeDate because this insures that the last record for each entity to be processed has a RankNo equal to 1. This makes the Type 1 Merge statement easier to write.

What about Transactions and Error Handling?

I’ve intentionally left those aspects out of this code for a couple of reasons. First, it’s easier to understand what’s happening without the extra code in the way. Second, you may not actually want explicit transactions in your Initial Data Warehouse load. It doesn’t take a Rocket Surgeon to figure out that if you load a huge Dimension historically, and you only use one transaction, your transaction log may grow out of control. This is particularly true if you are loading multiple Dimensions simultaneously. This is definitely an “It Depends” situation.

As I mentioned in the previous article, Extracting Historical Dimension Records using T-SQL, the Merge statement above should work in either Method 1 or 2. Also, because of the RankNo column, incremental Dimension loads which may contain more than 1 changed record per entity (CustomerNum) will work correctly as well.

Any questions?

SQL Saturday Oklahoma City Wrap Up

This past weekend I made the trek to Oklahoma City to participate in their latest SQL Saturday. It was the first time I had spoken there, but it didn’t feel that way. Kristin Ferrier (Blog/Twitter), Jeremy Marx (Blog/Twitter), Matt Brimer (Twitter), Allen Smith (Blog/Twitter), and the rest of the team were so welcoming, just what you’d expect from our #SQLFamily.

The event went off without a hitch. It was very well run, the volunteers knew exactly what they were doing, and the venue was great as well. I gave two presentations, which were fairly well attended. I didn’t stop to count, but I think both had more than 40 people. Judging from the feedback evaluations, both groups had a good time and learned a lot. As a bonus, both of my sessions were recorded. User Group TV was kind enough to record my Merge presentation, and I recorded the Data Warehouse Mistakes presentation myself using Camtasia. I’m currently editing the second presentation and hope to post a link to it tomorrow.

The recordings are below, along with links to download the presentation materials.

Data Warehouse Mistakes You Can’t Afford To Make

Have You Got the Urge to (T-SQL) Merge?

If you have never been to a SQL Saturday, I urge you to attend, get involved, volunteer, speak, or whatever you’re comfortable with. It’s so rewarding in so many ways.

For those of you who had questions about Biml Script, stay tuned as more blog articles are coming. Also, please check back for those video links.

Extracting Historical Dimension Records using T-SQL

Just because you can extract every single historical record for a dimension doesn’t mean you should. In this article, we’ll cover three basic extraction “methods” and when each of them may be preferred.

As with many of my Urge to Merge Series articles, this one will be presented in layers. In this case, each layer is actually a fully functional method and will work with the T-SQL Merge code I’ll publish in the next article. However, each step reduces the number of records which must be processed.

Let’s Get to the Code!

First download the following file which will create a Jedi Merge Database and the necessary schema, tables, and seed data. Also, the Excel Spreadsheet I used to create the screenshots is included as well. The script actually builds on the same basic code I’ve been using for the entire series. In this case, I’ve added a source history table, placed in a schema called source. While I realize that your source will most likely not be inside your Data Warehouse, I’ve done this for simplicity and so I can use T-SQL to load data into the staging table without having to actually use SSIS. In a real world scenario, the Select statement against the source would be placed into a OLE DB Source of a SSIS Dataflow instead.

The seed data contains 27 records, covering 4 different customers. The most important one is Yoda, Customer Number 102. I tried to strike a balance between “realism” and simplicity by mixing the other customer records with Yoda’s, but if you want to simpify this as much as possible, you can either delete the records with a Customer Number other than 102 or filter them out.

Source Data Set

Source Data Set


As I mentioned in a previous article, in order to use the code I propose, the following stipulations must be met.

  1. Each entity in our Dimension will have 1 active record which will be designated by a RowIsCurrent (or another variation) field and/or RowStartDate and RowEndDate.
  2. Entities in our Dimension will be continuous. So, for our DimCustomer, we’ll assume that we don’t have customers cancelling their accounts and then re-starting them. There will be no gaps in time.
  3. On each execution of the Merge statement, there will only be 1 record per entity to Merge. When we historically load Dimensions, this is clearly not the case. This is why I’ve included a column called RankNo which has been unused up to this point. In the next article, we will loop our Merge executions based on this column.

Method 1: Extract Entire History

We could simply extract all of the data as in the following query. Notice that I’ve used a the Row_Number function to obtain my Rank. I’m ranking them in reverse order of ChangeDate. The oldest records per entity (Customer) have the highest rank. This may seem counter-intuitive but it will make more sense when you see the looping Merge statements in the next article.

   SELECT CustomerNum
         ,Row_Number ()
             OVER (PARTITION BY CustomerNum ORDER BY ChangeDate DESC)
     FROM Source.CustomerHistory
   --Where CustomerNum = 102
Please excuse the date inconsistencies.

Please excuse the date inconsistencies (Yoda only).

For the sake of this demo I’ll simply append an Insert into to the above statement to load the staging table.

INSERT INTO Staging.DimCustomer (CustomerNum
   SELECT CustomerNum
         ,Row_Number ()
             OVER (PARTITION BY CustomerNum ORDER BY ChangeDate DESC)
     FROM Source.CustomerHistory
   --Where CustomerNum = 102
   ORDER BY CustomerHistoryId

This will work with the Merge code provided in the next article, but even in this small sample, the Merge will loop 20 times, which is excessive. Keep in mind that this may NOT actually matter to us since we should only be performing the initial/historical load one time anyway. If the dataset is relatively small, or we can afford to be less efficient, we may not want to pursue the other methods defined below.

Method 2: Extract Only History Which Matters

Unless the mechanism which records history in the source only adds new records for the exact same set of attributes designated as Type 2 for the Data Warehouse, it would be helpful to limit our extract. So, how do we determine which records actually matter? We need to extract any record which meets the following criteria:

  • The first instance of each entity (Customer) encountered so we can set the original RowStartDate (or whatever you choose to call it).
  • Each record that contains one or more Type 2 changes.
  • We need to be able to capture the final value(s) for every Type 1 column. One way is to extract the last record for each entity regardless of whether it included a Type 2 change.

What’s an effective way of detecting Type 2 changes? I prefer to use SQL Server 2012 Window Functions. You can learn more about them at the link I provided but basically I’m using LAG to compare the current value of our only Type 2 column (Planet) to the previous record for that entity. I’m also LEAD to detect the final record for each entity. I’m having a little contest to find the best way to accomplish methods 2 and 3 for SQL 2005 through 2008R2. See below for details.

SELECT CustomerHistoryId
      ,LAG(Planet) over(Partition by CustomerNum Order by ChangeDate ) PrevPlanet
      ,CASE WHEN LAG (Planet) over(Partition by CustomerNum Order by ChangeDate ) is NULL-- First Record
or LAG(Planet) over(Partition by CustomerNum Order by ChangeDate ) <> Planet        -- Type 2 Field Value Changed
or LEAD(Planet) over(Partition by CustomerNum Order by ChangeDate ) is NULL         -- Last Value per Entity
then 'Y'
else 'N' end as Matters
From     Source.CustomerHistory
Order by CustomerHistoryId;

Notice that our Case When statement checks for a NULL value (first record for an entity), any record for that entity where the Planet of the previous record does not equal the current value, and using LEAD to find the last record for that entity.
Using the Excel Spreadsheet I included in the download link above, you can easily auto-filter for only the records which matter.

Filtered for records which Matter.

Filtered for records which Matter.

How could we filter out the rows which don’t matter? Well, one way is to use the query as our data flow task OLE DB Source. We can then use a Conditional Split and only stage the records where Matters = ‘Y’. After that, we could run an update statement to set the RankNo of the remaining records in staging. Finally, we’d run the looping Merge in the next article and receive the exact same results as the first method, but with a lot fewer loops.

Method 3: Skip Staging and Merge Entirely

Wouldn’t it be nice if we could perform the Initial/Historical load without actually using Merge? That way we wouldn’t have to write every single relevant record to staging and re-write it (with some logical manipulation) again in the Merge statements. I recently worked on an Data Warehouse implementation where the data volume involved necessitated an alternative solution. In the following example, I’ve chosen to use a Temp Table, but could have used a Common Table Expression or a Table Variable instead. The thought process goes like this. If we can filter for only the records which matter, while at the same time retrieving the last value per entity (customer) for each Type 1 column, then we have no need to stage and Merge. We begin by Selecting the minimum number of fields necessary into the Temp Table which includes the ID field to uniquely identify each source row, any Type 1 columns, and our “Matters” calculated column. Notice that I’m using the SQL Server 2012 LAST_VALUE function to derive the last name used by each Customer.

IF OBJECT_ID('tempdb..#JediTemp') IS NOT NULL
    DROP TABLE #JediTemp
Select CustomerHistoryId TempCustomerHistoryId
       ,CASE WHEN LAG (Planet) over(Partition by CustomerNum Order by ChangeDate ) is NULL -- First Record
       or LAG(Planet) over(Partition by CustomerNum Order by ChangeDate ) <> Planet  -- Type 2 Field Value Changed
       then 'Y' else 'N' end as Matters
       ,LAST_VALUE(CustomerName) OVER (Partition by CustomerNum Order by ChangeDate
into #JediTemp
From     Source.CustomerHistory;


So, now it’s easy to visualize that if we filter out the records which don’t matter, and then join to the original data set to get the Type 2 column(s), we can actually create our Initial load without resorting to staging and T-SQL Merge. In this case, I’ve appended an Insert Into the Dimension directly from the source. However, in a real world scenario, you’d use the Select query as the source of your Data Flow Task and designate the Dimension as the destination.

INSERT INTO dbo.DimCustomer
(CustomerNum, CustomerName, Planet, RowStartDate, RowEndDate, RowIsCurrent) 
SELECT CustomerNum
      ,LastCustomerName CustomerName
      ,ChangeDate AS RowStartDate
      ,coalesce(LEAD(ChangeDate) over(Partition by CustomerNum Order by ChangeDate ), '12/31/9999') RowEndDate
      ,Case when LEAD(ChangeDate) over(Partition by CustomerNum Order by ChangeDate ) is NULL then 'Y'
       else 'N' End as RowIsCurrent
From     Source.CustomerHistory
join #JediTemp on CustomerHistoryId = TempCustomerHistoryId
and Matters = 'Y'
Order by CustomerNum;

In the next article, I’ll share T-SQL Merge code which will process the Initial/Historical Load of a Data Warehouse. The code will also handle an incremental load situation where more than one record per entity may be extracted per load.

A Contest with a Prize

As I mentioned above, I wrote this code using SQL 2012 Window Functions. I’d like to challenge my readers to submit code that will work in SQL Server 2005 through 2008R2. The winning submission will be chosen based on the following criteria:

  1. The code must be reasonably performant in most circumstances. I realize my example contains only 27 records but think about working with millions.
  2. Stage as little of the data as possible. Ideally, nothing would be written to disk.
  3. Needs to be a logical and repeatable pattern as I will be converting it to Biml Script.

Please submit your code to no later than Friday, August 30th and I will pick a winner over the weekend. The winner will receive a brand new copy of:

SQL Server 2012 Integration Services Design Patterns
I was one of the technical reviewers on that fine book so I recommend it highly. If the winner is a US Resident, I will mail them a physical copy of the book. For those outside the US I will provide a copy of the eBook. I’m sorry if that is inconvenient, but otherwise the international shipping costs could get out of hand.

Are there any questions?

Loading Dimensions Historically

There isn’t a lot of information on the inter webs about designing the historical load of a Data Warehouse. Consequently I see a lot of confusion on the subject. Before we go farther down the Urge to Merge Series, I thought it would be beneficial to define the concept with some visual examples.

The Initial/Historical Load

First, let me clarify what I mean. When I mention historical loading, what often comes to mind is processing Slowly Changing Dimensions going forward. While that’s related to this discussion, they aren’t really synonymous. What I’m talking about is loading the dimension records, with full slowly changing dimension support if possible, going backwards in time. I’m referring to the Initial load, rather than subsequent incremental loads. A great deal of development time can be dedicated to this one-time load because getting the history is so important. After all, who wants to invest so many resources into a Data Warehouse that won’t yield much value for months or years until “new” history can be accumulated? So, where can we find this history?

History or Audit Tables

Source Data Set

Jedi Demo – Customer History Table

Well, the first and most obvious place to look is in Historical or Audit tables. Often someone has already created a method to track history on important source tables, perhaps with SQL triggers, because this need has already been identified. If we’re fortunate, they are tracking changes on every column designated as Type 2 and we can simply query those tables. I’ll go into more depth in the next article on how to query the source table(s) using T-SQL.

Derived from Source Transaction Tables

Often, history has to be derived from related transaction tables. For example, let’s assume that our source system keeps no customer history at all, but you do want to be able to track sales (and other Facts) as accurately as possible. In our simple example, our first customer table may simply look like this:
Notice that all we have is the last version of the Customer Name and Planet. Well, if we want to be able to historically track the address (Planet in our example), where might we find this? Well, obvious places to look would be in invoice or shipping tables. We could simply join these two tables, possibly with a left join, to get a data set similar to our history table.

Remember, in the grand scheme it’s important to create this history in your dimensions and not rely on your FactShipments table for the information. For example, let’s say that this Jedi Demo Data Warehouse is for a retail business. You may want to track your vendor purchasing details (for resale), Profit Amounts (from Invoicing), etc., per customer Planet. If you’ve loaded DimCustomer historically, then this history can be applied to other areas of the enterprise.

Periodic Restores

This method should be considered a last resort. If your client really needs certain history, and there just isn’t any way to get it from the source system, you may be able to restore previous versions of the database(s) and perform an ETL load after each. I worked on one system where we loaded an end of month database backup for the previous two years. These 24 restores were done from tape, and were extremely tedious. However, we got the job done and the client was satisfied.

Considerations and Complications

What happens if only certain columns designated as Type 2 can be obtained historically? Suppose you simply can’t get historical information for Planet, but want to track it going forward? The implications of this must be explained in detail to the client team and ultimately they make the decision. Over time, the issue becomes less and less important because you are accurately keeping history going forward and those “faulty” records may be queried less and less over time.

Also, many Data Warehouse implementations are rushed due to business demands and such. Invariably this urgency causes corners to be cut and it’s possible that some dimensional model changes will require a reload of certain sections of your warehouse. Or sometime after the Data Warehouse is in production, the client decides that additional dimension attributes need to be tracked historically. If the Data Warehouse kept certain information historically as of the “go live” date, and you have to completely reload certain tables, this creates a difficult situation. For more information, I’ll refer you to Warren Thornthwaite’s excellent article Creating Historical Dimension Rows.

Next Steps

In the next two articles I’ll cover the following:

  1. How to write queries to selectively extract records on the Initial/Historical Load.
  2. How to use T-SQL Merge to process these loads.

Any questions?

Upcoming Presentations - NTSSUG and SQL Saturday OKC

Tonight at the North Texas SQL Server User Group

This is just a quick reminder that I’m presenting another hour in my continuing series Data Warehousing for the North Texas SQL Server User Group. We’ve already covered Dimension Tables in depth, so tonight I’ll be presenting, “Fact Tables – Show Me The Money!” I’ll explain the fundamentals of Fact Table design and demonstrate how to load them with SSIS.

The PowerPoint Slides, SSIS Project, and T-SQL Setup code can be downloaded here.

Also presenting tonight will be Devin Knight (Blog/Twitter). Devin will be presenting, “Getting to Know Power Query.” Devin’s presentations are always great so I urge everyone in the area to attend.

SQL Saturday 223 – Oklahoma City

I’ll be giving the following presentations in OKC on Saturday August 24th. I’ve given both of these before, but I am always improving them.

Have You Got the Urge to (T-SQL) Merge?

You’ve got data to load into an already existing table, perhaps a slowly changing dimension, and need a method that is simple, supportable, and performs well. The T-SQL Merge statement, introduced in SQL 2008, can do all of this and more. Many DBAs investigate it, but find it unintuitive, time consuming, or they couldn’t get it to perform well. This doesn’t need to be difficult; after attending this session you’ll thoroughly understand the fundamentals of the merge statement and how to write them. You will also leave with scripts which automatically write well formed merge statements for you in less than 1 minute, guaranteed. Come see how.

Data Warehouse Mistakes You Can’t Afford to Make

Many data professionals understand the basics of Data Warehouse design, including Dimension and Fact Tables, slowly changing Dimensions, and the use of meaningless surrogate keys. However, it isn’t until you’ve created a dimensional model and put it into production, that you realize just how much of an impact seemingly trivial mistakes can make. They can hobble performance, allow inaccuracy, and perhaps worst of all, inhibit adoption and usage of the new system. Learn how to avoid many common mistakes, from someone who’s made them and then found ways to correct them.

I’ve received high marks on both of these presentations for good content and funny delivery. Nobody talks about Data Warehousing the way I do, and I’m sure a good time will be had by all.

If you haven’t signed up for SQL Saturday OKC you can find more information about it on their website.

Hope to see some of you there!