Agile development is good, Waterfall is bad. We hear this all the time, along with un-documented statistics claiming that approximately 80% of Data Warehouse projects fail. So, we as Business Intelligence Professionals are told that we must be Agile. Whenever I taught Dimensional Modeling and Data Warehouse classes, I could count on at least one student asking me how to apply Agile principles to this process. My answer, up until several months ago, was that true Agile Data Warehouse development was a pipe dream. The reason why is fairly obvious if you follow the Kimball method. In a nutshell, when building Star or Snowflake Schemas, one should design, build, and load related Dimensions before Fact Tables. Since each Fact Table, which contains your metrics, typically relates to a majority of the Dimensions, the first Fact Table load requires a great deal of time from the Data Warehouse Development Team and business stakeholders. Dimension tables require a lot more time and resources to develop than Facts. After that first “wave” if you’ll pardon the pun, additional Fact tables and metrics can be delivered quite rapidly. The natural consequence of this is a form of Waterfall development.
We’ve always tried to ameliorate this problem by prioritizing business processes by their “Business Value” and feasibility. In other words, we typically chose to create our first Fact Table(s) for business processes that were worth a lot of money and were the easiest to build and load. We traded a true Waterfall, delivering the entire solution at the end, for a partial Waterfall. Less painful? Sure, but that doesn’t equate to Agile development.
Agile Data Warehouse Design
Several months ago Tim Costello (Blog/Twitter), a good friend of mine from the community, urged me to read the following book:
I have a great deal of respect for Tim, so I read it and the book is excellent. It offers comprehensive tools and techniques for improving the entire process of designing a data warehouse. The book emphasizes quick iterations, each of which involves:
ModelStorming – Their new method of conducting business process interviews.
JIT (Just In Time) Dimensional Modeling
Visual BI Delivery
These ideas and concepts are great and they address several issues with the traditional Data Warehouse Lifecycle. However, up until recently it was impossible to build a data warehouse this way.
Agile Data Warehouse Design Development?
In an ideal world, building a data warehouse would be an Agile process. The first sprint would deliver one or more Fact Tables, and possibly a couple of prototype or primitive Dimensions. In each successive sprint, more Fact Tables could be deployed, new Dimension tables could be added, and existing Dimensions could have additional attributes created. Existing tables would easily “adapt” to the changes. For example, existing Fact Tables would automatically be related to new Dimension tables, if appropriate. Ideally, this whole process would be fluid and quick. You know umm… Agile. However, anyone who develops ETL processes knows this is not the case.
Let’s consider the work involved in making just one small minor change such as adding a new attribute to an already existing Dimension. The process may involve the following steps.
Alter the Dimensional Model if using a dedicated tool.
Alter the Source Query(ies).
Any dependent objects need to be altered as well such as staging tables, stored procedures, user views, etc.
T-SQL must be written or generated to make these changes.
The SSIS Package(s) needs to be edited.
If SSAS Cubes are being used they also need to be edited.
In traditional ETL development, none of these objects above react dynamically to the change, they all have to be handled separately. This is only scratching the surface. Best practices suggest using Source Control on all of the objects above, which is very time consuming. Also all of these changes should be fully tested in a separate environment before being deployed to production. So, this single small incremental improvement could require several days (perhaps weeks) of development work. So, why did I preface this with section with “until recently?”
Biml Facilitates Agile Data Warehousing
“BimlScript enables developers to build and load a Data Warehouse nearly as fast as they can draw the boxes and lines.”
Using a quality BimlScript framework eliminates a lot of the pain, tedium, and costs involved in Data Warehouse Development and ETL. How would the same change from above be accomplished using a good BimlScript framework?
Alter the Dimensional Model if using a dedicated tool.
Alter the Source Query(ies).
Automatically import the changes into Biml.
Automatically regenerate every single object listed above including staging tables, user views, SSIS Packages, SSAS Cubes, etc.
Well, what about Source Control? If the BimlScript framework automatically generates all of those objects above from the Dimensional Model and its metadata, what really needs to be checked into Source Control? The Dimensional Model and the BimlScripts. Since the only object that was altered during this “sprint” was the Dimensional Model itself, that’s the only object which needs to be checked in and out of Source Control.
What about testing? If the results of your BimlScripts have been tested in prior sprints, and you didn’t change any of that code, how much QA testing is really required? I would argue that only a cursory amount of testing would be required to validate your metadata.
In short, BimlScript is the perfect companion to the Agile design methods proposed in that book. Stay tuned to the blog for more details.
Biml is a human readable XML language, created by a company called Varigence, which compiles into SQL Server objects such as tables, views, SSIS packages, and SSAS cubes. In this post, we’ll cover the basics of the Biml language so you can start coding as fast as possible. Since, I’ve always found that it’s easier to learn new things by example, I will use one of the scripts from my Biml – The Next Step in Agile Data Warehousing presentation.
Biml is not proprietary and anyone can create software which generates and/or consumes it. There are many ways to generate Biml, but most people currently use one of two methods. For those who don’t want to pay for a dedicated product, Biml (and BimlScript) can be written in the Business Intelligence Development Studio (BIDS) or SQL Server Data Tools (SSDT) and compiled with a free program called BidsHelper. Most ETL Developers have at least heard of BidsHelper, and I’ve been using it for years. Instructions on installing BidsHelper and creating your first simple Biml file can be found here.
The other popular method is to use Mist, which is a product of Varigence. Let me preface what I’m about to say with a disclaimer.
I do not work for Varigence and am not financially compensated by them. I don’t benefit in any way if you purchase Mist or choose to use BidsHelper instead.
That being said, I find Mist to be invaluable for serious BimlScript development. Most of my scripts are created using Mist, but they can all be edited, compiled, and executed with BidsHelper. In the future, if I provide a script which cannot be run without Mist, I will clearly point that out. The basic development process when using BidsHelper varies from Mist in several ways. One way is file naming and grouping. If using Mist to create or import tables, connections, and other objects, a separate file is usually created for each. In Mist, BimlScripts can be manually executed or they can be “live.” If they are live, the objects are sort of created virtually and updated in real time. Since Mist keeps track of these live objects, it doesn’t matter if each Connection, Database, Schema, etc. is in a separate file.
In BidsHelper, we tend to group similar Biml objects together because there is no live option when using that program. Therefore, any files containing objects must be pre-selected before execution. So, if your connections, databases, and schemas were spread across 8 files, you’d have to constantly pre-select them whenever you ran other BimlScripts. Obviously, the fewer Biml files we need to highlight, the faster we can work. To that end, I and others typically start by creating a standard file called Environment.biml. This file contains my Connections, Databases, and Schemas.
Note, that while I use the same Environment file in both BidsHelper and Mist, when using Mist the Project View shows one file, while the Logical View properly segregates them.
Let’s break down the basic parts and rules of a Biml XML file. The first line is called the Root element and references the XSD file. This is the XML Schema Definition file which contains the rules of the Biml declarative XML language. Under the root elements are plural collections of elements/objects such as Connections, Databases, Schemas, Packages, etc. Each plural collection will have one or more individual objects nested inside, which you can see in the above script. Also, each plural collection and each nested Biml object must have a closing tag.
Each type of Biml object has different properties that pertain to it, only some of which are required. In my example, you’ll notice that I’ve defined two databases, each of which has two sources, along with several schemas. One of the schemas I’ve defined is named “star.” That’s a little inside joke because I recently worked on a project where all of their dimension and fact tables were placed inside a “star” schema.
All Biml objects including the plural collections are case sensitive. Varigence uses Pascal Case in which words are concatenated and each new word begins with a capital letter, like OleDbConnection. Biml attribute values (properties) must be placed inside quotes and are case sensitive as well. Therefore, a schema named “Star” is not the same as one named “star”.
Biml Comment Syntax
Biml Comments are created using the syntax I’ve show above. Keep in mind that XML Comments can span more than one line. The following is perfectly valid.
at Made2Mentor.com -->
Object Naming and Scope
Notice the way that objects are named in the above example. Obviously schema names cannot be completely unique otherwise you could only have one database in your Biml project with the schema “dbo.” However, objects must be unique within their scope. In this case, my schema names must be unique within the Database scope.
Illegal XML Characters and the use of CDATA
Remember that Biml is simply XML and as such has certain characters that have special meaning are considered illegal. Some common characters which can cause trouble are the comparison characters which are often part of T-SQL or SSIS Expression statements. If you’d like you can replace the following characters with their corresponding pre-defined Entity Reference:
For example, an Execute SQL Task with a Direct Input query like the following will result in an error.
Select Fields From SomeTable Where SomeField <5
Select Fields From SomeTable Where SomeField < 5
This can be re-written as:
Select Fields From SomeTable Where SomeField < 5
Select Fields From SomeTable Where SomeField < 5
However, I find that this makes code difficult to read and follow, so I typically enclose this kind of code inside a CDATA section which tells XML not to parse it. CDATA can span multiple lines as well. Since most of my Biml is generated, including the CDATA section is trivial.
<![CDATA[ Select Fields From SomeTable Where SomeField < 5 ]]>
White-space can be inserted in your code to make your code easier to read and the compiler ignores it. However, white-space is NOT ignored in your property values. Consider part of the script above.
My enthusiasm for Biml must be catching because I’ve been asked to give two presentations next week.
I’ll be giving a brand new presentation on Tuesday, November 12th, at 4:30pm for the MSBIC group at the Microsoft Building in Irving, TX. This presentation is targeted towards Business Analysts, management, and other members of the data warehouse team. If your boss has been on the fence about adopting Biml, try to get him/her to see this presentation. Anyway, details can be found on the MSBIC site and the abstract is as follows:
The Business Case For Biml
Business Intelligence Markup Language is literally changing the way we think about Data Warehouse Design and ETL Development, but what impact will this have on the rest of the business? In this presentation we’ll discuss how Biml adoption may affect things like the Data Warehouse Development Lifecycle, Team Dynamics and Roles, Quality Assurance Testing, and Business Process Interviews. We will also discuss how to maximize your return on investment (ROI) by utilizing BimlScript in your data warehouse projects.
Next Thursday, November 14th, at 6pm CT I’ll be giving a remote presentation to the Tulsa SQL Server User Group. I’m going to present a re-factored and more detailed version of the Biml presentation I gave at SQL Saturday #255 last week. I’m hoping that it will be available for live streaming and I’ll follow up with more details once everything is straightened out. I’m thrilled to be giving this presentation again because I will have more time and can show more of the code. Once again, the abstract follows.
BIML – The Next Step in Agile Data Warehousing
Business Intelligence Markup Language (BIML) is literally changing the way we think about Data Warehouse design and ETL. In this interactive demo-focused presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use BIML Script to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.
But wait, there’s more….
I mentioned my intention to present a full day Biml Workshop in a previous article, and a local firm has approached me and would like to sponsor it. I know a lot of my NTSSUG SQL Family are really eager for some Biml training and I promise to post any developments and updates right here on the blog.
I’m happy to announce that I’ve just been elected to the North Texas SQL Server User Group Board of Directors. I owe so much to the PASS Community in general, but even more to our local group. In case you aren’t aware, the NTSSUG is one of the most active and successful groups in the entire country, and I’m very honored to join the board.
I want to just take a moment to acknowledge what our group has already achieved.
We’ve had five successful SQL Saturday events. Our last, SQL Saturday #255 had more than 500 attendees.
Hosted SQL Rally in Dallas.
We routinely draw 60 to 80 participants to our monthly meetings.
I’d like to take a moment to thank the two people who also ran for the board this round, Miyaka Fusi (Twitter) and Chris Utter. Being a board member is a serious commitment, and I hope to continue to see you folks at meetings and events.
Also, I’d like to take a moment to thank our outgoing two board members as well. Erin Welker (Blog/Twitter) will be retiring as our Past President and she’s served since the group’s inception more than 10 years ago. Tim Mitchell (Blog/Twitter) has been our Director of Finance for the past four years and his contributions have been vital to the growth of the group.
The current board is a very cohesive team made up of talented and dedicated professionals. Great guys one and all.
What is the Matrix really about? It’s about this really smart guy, who’s disillusioned with his surroundings and life. Everything seems flat and boring, and he can sense that there’s something fundamental being hidden from him. He goes to work everyday, doing the same things, but is spending his nights on his computer searching for the secrets of The Matrix.
This is a person who seeks more. He seeks to gain a true understanding of his environment, his role in it, and eventually to gain mastery of it and his fate. After enduring the process of searching for a guide, learning what the Matrix is, and lots of Kung Fu and gun battles, he has an epiphany.
Neo Sees the Source Code
Neo is The One
He realizes that he is in fact “The One.” He can see the underlying code of The Matrix and can use his gift to make his world into what he wants it to be. He doesn’t have to toil in tedium for the rest of his life.
Can I Learn to Dodge Bullets?
Since Biml compiles into SQL Server Objects like views, tables, SSIS packages, and SSAS cubes, you can learn to automate their creation and functionality with BimlScript. Essentially, you gain the ability to “see the code,” and all of the resulting power that conveys. As I demonstrated on Saturday, you can create BimlScripts which will automatically create and load a data warehouse based upon metadata.
Can you learn to dodge or stop bullets? Well, no. However, if you spend time mastering BimlScript, you can build and load a data warehouse nearly as fast as you can draw the boxes and lines. You can save hundreds of hours of tedium, deliver data warehouses in rapid iterations, and save companies untold thousands of dollars.
Will You Take the Blue or Red Pill?
So, will you take the Blue pill and go on about your everyday life? Are you content to build and repeatedly edit the same old SSIS packages again and again? Or will you take the Red Pill and learn BimlScript instead? Will you learn how to automate your data warehouse loads, create new SSIS Packages, SSAS Cubes, and other SQL Server objects at will? The choice is yours.
As I closed my Biml presentation, I mentioned that I would be posting the BimlScripts I wrote for it on the blog for download. I’m also going to include articles detailing how the code works and how to write similar code yourself. Stay tuned in the coming days and weeks for those articles. Also, I am working on scheduling some full-day BimlScript Workshops. I’m currently trying to find a good venue in Dallas/Fort Worth, and may have already secured one in Austin, TX. Finding the right venues will be critical as I’d like to make the training as affordable as possible. If you would like to attend a workshop, know of a good venue, etc. feel free to contact me at David@Made2Mentor.com.
Continue to stay tuned to the blog. I’ll be posting more Biml articles, announcing workshops, and other cool stuff in the coming weeks.
Just wanted to give my readers, both of you, a quick heads up on my upcoming presentations, both of which deal with BimlScripting.
“Biml Basics” Tuesday Night at NTSSUG
Tuesday night (10/29) I’m presenting the final session (for now) of my Data Warehousing Basics and Beyond series for the North Texas SQL Server User Group. We’ve covered the basics on how to create both Dimension and Fact Tables, as well as how to load them with SSIS. This final presentation will be on Biml Basics. We’ll discuss what Biml is, the basics of how to extend it with script, and why you should adopt it. I think everyone will find it interesting and this session will serve as a good pre-lude to my upcoming presentation at SQL Saturday 255.
SQL Saturday #255 – Arlington, TX
Our next SQL Saturday is this week (11/2/2013) and I’ll be volunteering as usual. The regular event is free to everyone and we still have room for more attendees, so I urge everyone in the area to attend. I’ll be presenting the following at the event:
Biml – The Next Step in Agile Data Warehousing
Business Intelligence Markup Language is literally changing the way we think about Data Warehouse design and ETL. In this interactive, demo focused presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use Biml Script to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.
Also, we are offering three pre-conference sessions on Friday (11/1) from well known experts in their field, which you umm… may have heard of. 🙂 They are an excellent value at $120 for a full day of training. The speakers and session titles are as follows:
A Day of SSIS 2012 – Andy Leonard
Query Perfomance Tuning – Grant Fritchey
Data Science Crash Course- Drew Minkin
More details for the SQL Saturday event and Precons can be found here. I hope to see a lot of you there!
I’ll admit it, I’m a cheater. Now, before you click away in disgust, give me a minute to explain. Almost 15 years ago, when I felt stuck in my career as “The Made2Manage Guy,” I’d routinely run out of things to do. I mean, once you’ve learned all the pitfalls and gained a mastery of the program and job, what more was there to do? I can almost hear you saying, “Umm… you could learn something else and grow.” Well I hear ya, and nowadays I can’t imagine wasting idle time like that. However, this was a long time ago, and like so many people in the early days of the internet, I found entertainment on the interwebs.
Don’t Hate the Player
I found Pogo.com which was and is a free online games site that caters to the soccer moms everywhere. I was pretty good at one game in particular, Word Whomp, which is a fun vocabulary based game. It’s kind of a cross between Whac a Mole and Scrabble. For each new round, a string of “random” letters is presented along with clusters of blank tiles representing all of the English words one can make from those letters.
Anyway, as I said, I was pretty good at it, but there were people on the leader boards who were inhumanly good at it. Well, I’m a naturally competitive person and was rapidly becoming frustrated and with the repetitiveness of the game and my inability to compete with the leaders. So, what did I do? I cheated. Before I describe how, let’s just have a disclaimer here. While I was cheating, there was no payoff for it other than personal satisfaction. I consider it “harmless cheating.”
Cheaters Never Win?
I fired up Visual Basic, I forget which version now, and created a program which was able to screen scrape Word Whomp for that random letter set. It then scanned through its own local Access Database (Lame… I know) for applicable words which could be made. Each viable word was then submitted. At the end of every round, Word Whomp helpfully lists all the words you missed, and my program scraped those too then added them to the database. Since my database started “empty” it took a bit to gain momentum. However, the longest words are only six characters, so as time went on, my program got more and more efficient at its task and rarely missed a word. Needless to say, it wasn’t long before my program was near the top of the leader boards. However, as time went on, it also became obvious that I wasn’t the only cheater either as I never did end up in first place.
What does this have to do with Biml?
A couple of things occurred to me as I finished the cheating program. I had a lot more fun writing the program to cheat at Word Whomp than actually playing the game itself. I get bored when performing repetitive tasks, even if they’re supposed to be fun. This is one reason why I write so much code that writes other code. I’ve never understood why some folks are content to re-write the same basic code repeatedly.
As I’ve mentioned on the blog and in my recent videos, I adopted Biml Script for a recent project and saved enormous amounts of development time as a result. Granted, there is a learning curve to gain an understanding of Biml Script and deciding how you will leverage it. However, as soon as I passed that stage, my productivity just took off. Although I am used to working myself out of a job, even I wasn’t prepared for just how quickly I finished my last project. Working on my Biml Script framework, is a lot like the Word Whom cheating program. It’s fun and liberating to design new frameworks and methodologies to solve age old problems.
Incidentally, shortly after I finished my program, I grew bored again watching it play Word Whomp. This was about the time that I decided to work my way out of that boring job and started reaching out to members of the PASS Community like Brent Ozar (Blog/Twitter). Since then I’ve put my energy, ingenuity, and drive into becoming a world class Data Warehouse Architect and am now “cheating” with Biml. I urge you to do the same.
A “Hello World” program is an ages old traditional way to teach the basics of syntax for a programming language to a beginner. Since so many of us are just starting to use Biml Script, I thought it would be fun to start out with it. Also, I’ve made the entire solution available for download.
Before we begin, download and install BidsHelper from Codeplex.com. Then open Business Intelligence Development Studio (BIDS) or SQL Server Data Tools (SSDT) and create a new project. We’ll call ours “HelloWorld”.
Right click on your project name and select “Add New Biml File”.
Open the Biml file and you’ll find this:
This is an XML declaration that specifies the XML Schema Definition (XSD). The XSD file defines “the rules” for any XML document created and validates them against those rules. Now, before we start getting into Biml code, I want everyone to read about the Copy and Paste Issues in Visual Studio. Otherwise, any code you copy from my blog (or really any source) and then paste into Visual Studio will not work.
Create a Basic SSIS Package
Since the whole point of “Hello World” is to create the most basic example in a programming language that produces output, the simplest step we can take is to create a package. So, let’s add the bare minimum necessary to create a package called “Hello World”. Begin by hitting enter at the end of the first line and typing “<". Intellisense should kick in with a list of options. You can either continue typing "Packages" or select it from the list.
As soon as you close the Packages Node, you’ll notice that the closed Packages tag appears and the cursor is placed between those tags. Type “<" again and you'll notice that Intellisense offers fewer options this time. Either type or choose "Package" (singular) and then type ">” to close that as well. You should see that your code has a blue squiggly line below “Package” which indicates that something is missing. If you hover over that, you’ll see the following:
Our code is missing a Constraint Mode (and also a Package Name) which is required. So, if we edit our code to add a Name of “Hello World” to our package and then type “ConstraintMode” the Intellisense offers the two options of Linear and Parallel. Linear will cause each Control Flow Task to be run in order, while Parallel allows them to be run concurrently and also allows you to specify the precedent constraints individually per task. For our simple example, just select Linear. Your code should now look like this:
Right click on your Biml Script file, and select “Generate SSIS Packages.” A generic package named “Hello World.dtsx” should be created. If you attempt to open it, you’ll see that the package is empty.
So, we’ve achieved the basic premise of a “Hello World” application, but lets dig further. Let’s add the Biml code for a couple of Tasks. First, we’ll add an Execute SQL Task. Now, if you think about a simple Execute SQL Task, what elements would be required?
SQL Statement if using direct input.
So, we need to have a Connection. A Connection requires a name and a connection string, but it also requires an existing database. I’ve created an empty database called “HelloWorldDemo” on my instance. There are a number of ways to generate connection strings, but I prefer the following way because it is so simple. For your convenience I’ve recorded this quick video, but you can also follow the instructions below it.
Create a text file somewhere. I’m lazy so I created mine on my desktop and named it Connection Strings.txt. You may name yours whatever you wish.
Make sure that file extensions are not hidden on your system. Highlight the txt extension and replace it with udl. Windows will warn you that changing the extension might cause problems. Select Yes and change it anyway.
Double Click the file to open it. You’ll be presented with the standard Data Link interface that we’re all familiar with.
Configure that screen, but please select “Use Windows NT Integrated security.” You’ll see why in a moment.
Select the Database, in my case it’s called HelloWorldDemo, and click Test Connection. After that succeeds, select OK to close the screen.
Right Click the udl file and select Open With… Notepad (or some similar program). The last line of text will have your connection string.
Return to your Biml Script and create a Node called Connections just below the first line. Then place the code into it like so:
Now that we have a connection, we can create Packages and Tasks which use it. Let’s create the Execute SQL Task mentioned above. First, we have to create a Tasks Node. Our Execute SQL Task will connect to our Connection, but won’t actually do anything.
Right click on your Biml Script and click Generate SSIS Packages. This time, since the Hello World.dtsx package already exists, you’ll receive the following window which allows you to selectively over-write packages. Obviously, in this case we want to over-write the package so select Commit. Now, if you open your package you will find a single Execute SQL Task and if you double click to open it, you will find the following:
While the package will open, those of us working in SSIS 2012 will receive an error while running it.
You will learn to loathe this.
The default ProtectionLevel for a SSIS project is EncryptSensitiveWithUserKey, which most of the time is not what you want. Basically, any user other than the original creator who attempts to use this package must provide all of the credentials. I prefer to use DontSaveSensitive whenever possible and let Active Directory handle permissions. Right click on the project name and select Properties. Change the ProtectionLevel to DontSaveSensitive. If your project still contains the default Package.dtsx, you will need to open that and change it’s ProtectionLevel setting or delete it to continue.
After changing the project ProtectionLevel, go ahead and run Hello World.dtsx. The SQL Hello World task will turn green, and there is peace in the valley. Next, let’s create some tables in Biml.
Step 1 – Create Definitions
Before we begin creating tables in Biml and Biml Script, let’s just define what is necessary to create the same tables using T-SQL. We’re going to create two tables so I can demonstrate how Biml Script will iterate through them. The two tables will be called “HelloWorld” and “FromMade2Mentor.” Let’s look at the T-SQL that I would write to define these.
IFNOTEXISTS(SELECT1FROM sys.schemasWHERE name ='dbo')BEGINEXECsp_executesql N'CREATE SCHEMA dbo AUTHORIZATION dbo';
IFEXISTS(SELECT1FROM sys.tablesWHERE SCHEMA_NAME(schema_id)='dbo'AND name ='HelloWorld')BEGINDropTable dbo.HelloWorld;
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
CreateTable dbo.HelloWorld([HwId]smallintIDENTITY(1, 1)NOTNULL,
IFEXISTS(SELECT1FROM sys.tablesWHERE SCHEMA_NAME(schema_id)='dbo'AND name ='FromMade2Mentor')BEGINDropTable dbo.FromMade2Mentor;
CreateTable dbo.FromMade2Mentor([MentorId]smallintIDENTITY(1, 1)NOTNULL,
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = 'dbo')
EXEC sp_executesql N'CREATE SCHEMA dbo AUTHORIZATION dbo';
IF EXISTS (SELECT 1 FROM sys.tables WHERE SCHEMA_NAME(schema_id) = 'dbo' AND name = 'HelloWorld')
Drop Table dbo.HelloWorld;
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
Create Table dbo.HelloWorld
[HwId] smallint IDENTITY(1, 1) NOT NULL,
[HWChar10Field] varchar(30) NULL
IF EXISTS (SELECT 1 FROM sys.tables WHERE SCHEMA_NAME(schema_id) = 'dbo' AND name = 'FromMade2Mentor')
Drop Table dbo.FromMade2Mentor;
Create Table dbo.FromMade2Mentor
[MentorId] smallint IDENTITY(1, 1) NOT NULL,
[MentorDate] date NULL
While I realize that both of these are in the schema dbo and the create schema statement isn’t really necessary, I’ve left it in for two reasons. First, you’d need a statement like that to verify that a named schema like ‘Staging’ already existed before creating tables in it. Second, it illustrates how Biml script is similar to the above statement.
Creating tables in Biml is fairly easy. The important thing to remember is that you must first define the table(s) in a Biml file, and then use another Biml Script file to build the package(s) which you would then run to create them. The end result is a SSIS package with Execute SQL Tasks which create the necessary tables. So, let’s start with the Biml file which will contain the table definitions and we’ll name it Destination Table Defs.biml. As you can see in the following code the Biml follows the same basic path.
The Connection is declared (which isn’t necessary in SSMS but is implied).
The Database(s) is specified.
The Schema(s) is specified.
The Table Name is specified. (as in Create Table dbo.HelloWorld).
The Columns are named and defined for data type and other attributes such as Nullability and Identity.
Now, save your project, right click on Destination Table Defs.biml and select Check Biml for Errors. Assuming that your script passes validation right click again and select Generate SSIS Packages. What happens? Well, nothing. Why? You have provided the table definitions, but haven’t provided the mechanism to actually create them. That is our next step.
Step 2 – Generate Package(s)
Knowing what we learned earlier, we could create a static Biml file with Execute SQL Tasks to create these tables, but that’s not very efficient. Wouldn’t it be great if we could just dynamically create that package and it would create every table we’ve specified? This is your first taste, small though it is, of the power of Biml Script. Let’s go through the steps to create such a package as if we were going to do it in SSIS. We’d most likely:
Create a new package and name it. I’ll name it “Create Destination Tables.dtsx”.
We’d create and specify a Connection.
For each table we needed to create we’d drag out an Execute SQL Task, rename it, specify the connection, and provide the SQL query to use for Direct Input.
Close and save the package.
Of course, this process is tedious. It’s the same set of clicks and drags over and over again. Also, you’d need to open this package and add another Execute SQL Task every time you added a new table to the destination. Let’s create a new Biml File and name it “Create Dest Tables.biml”.Read through the following Biml Script code and you’ll see that it follows along with the process we’d take in SSIS. Paste this script into our new Biml file.
You may notice that Visual Studio shows you red sqiggles with your C# code. That’s because it is expecting XML only so you’ll have to learn to ignore them. Notice how all of the C# code is held inside of the pound signs. The foreach statement loops though every table defined and assigns it to the table variable I’ve declared. The embedded C# such as #=table.Name # is returning the Name property of the Table object. Therefore, I am dynamically naming my Execute SQL Tasks and correctly specifying them for each iteration. GetTableSQL is a function provided by Varigence which does just what you’d think. Now, right click Create Dest Tables.biml and check for errors. You’ll receive the following:
Basically, the problem is that the code is trying to iterate through the tables, but since we only selected the script file and not the definitions file, there were no Tasks created under the Tasks node. If you select both of the files and check for errors, this should be remedied. After passing that check, make sure that both are selected, right click and generate the package. Upon opening the package, you should see the following:
Now, double click SQL Create dbo HelloWorld. After the window opens, click the ellipses to expand the SQL Statement. After expanding the window to show the entire statement, you’ll notice that the resulting SQL query looks almost identical to the one we created above.
If you run the resulting package, the two tables we defined will now exist in the database.
Just for Fun
Typically a “Hello World” program would pop up that message to the user. While we have mimicked that with Packages and Tables “popping up,” I thought it would be fun to show you a little trick to finish this article. I borrowed this with permission from a friend of mine who just recently started blogging over on his new Biml Geek Blog.
Open our Project again, create a new Biml file, and name it. I’ve chosen to name it Hello World Message.biml. Copy and paste the following code into it.
Remember that the scripting in this article is simply C#. Therefore, we can leverage the language to do a lot more than just iterate with For Each loops. So, one last time, right click the Biml file and select “Generate SSIS Packages” and you should see the following:
If T-SQL Merge can only handle one changed record at a time, how can we use it for our initial/historical load? Further, how can we use Merge in those situations where the incremental load may produce more than 1 changed record per entity? Well, I’m glad you asked.
Welcome to the next article in the Urge to Merge Series. In this series I’ve tried to clear up many misunderstandings about how to use T-SQL Merge effectively, with a focus on Data Warehousing.
INSERT INTO Staging.DimCustomer (CustomerNum
OVER (PARTITION BY CustomerNum ORDER BY ChangeDate DESC)
--Where CustomerNum = 102
ORDER BY CustomerHistoryId
The most important part of this query is the RankNo column. Notice that I am assigning a Row_Number per entity (Business Key) by descending ChangeDate. This means the oldest records per CustomerNum get the highest RankNo value. My looping Merge is set to loop from highest rank down to 1, which may seem counter-intuitive to some. The reason I do this will become clear in a minute. Anyway, the above query will create a data set like the following in Staging.DimCustomer:
Notice that I’ve filtered for Yoda only for simplicity.
If you’ve been following along in this series, I’ve repeatedly mentioned a limitation of T-SQL Merge. On each execution of the Merge statement, there will only be 1 record per entity to Merge. Well, we’re about to deal with that issue now. Just because we can only Merge one change record per entity at a time, doesn’t mean we can’t loop through Merge statements to accomplish an Initial (Historical) Dimension Load. The Merge code should look familiar as it builds on my previous articles.
IF OBJECT_ID('tempdb..#DimCustomer') IS NOT NULL
DROP TABLE #DimCustomer;
CREATE TABLE #DimCustomer(
CustomerNum int NOT NULL,
CustomerName VARCHAR(25) PRIMARY KEY CLUSTERED,
Planet VARCHAR(25) NOT NULL,
RowIsCurrent Char(1) Not NULL Default 'Y',
RowStartDate datetime Not Null Default getdate(),
RowEndDate datetime Not NUll Default '12/31/9999');
DECLARE @Counter SmallInt
SET @Counter = (SELECT MAX(RankNo) FROM Staging.DimCustomer);
WHILE (@Counter >= 1)
-- Type 2 Loop
INSERT INTO #DimCustomer (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate)
Select CustomerNum, CustomerName, Planet,'Y', RowStartDate, '12/31/9999' From
(Merge dbo.DimCustomer AS Target
USING ( Select CustomerNum, CustomerName, Planet, ChangeDate, RankNo From Staging.DimCustomer
Where RankNo = @Counter) as Source
ON Target.CustomerNum = Source.CustomerNum
AND Target.RowIsCurrent = 'Y'
and EXISTS (SELECT Source.CustomerNum, Source.Planet
EXCEPT SELECT Target.CustomerNum, Target.Planet)
THEN UPDATE SET RowIsCurrent = 'N',Target.RowEndDate = Source.ChangeDate
WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate)
Values (Source.CustomerNum, Source.CustomerName, Source.Planet,'Y',Source.ChangeDate ,'12/31/9999')
Output $ACTION ActionOut ,Source.CustomerNum, Source.CustomerName, Source.Planet, Source.ChangeDate RowStartDate
) AS MergeOut
WHERE MergeOut.ActionOut = 'UPDATE';
Select CustomerNum, CustomerName, Planet, RowIsCurrent, RowStartDate, RowEndDate from #DimCustomer;
TRUNCATE TABLE #DimCustomer;
SET @Counter = @Counter - 1;
-- Type 1
Select @Counter Counter
MERGE dbo.DimCustomer AS Target
USING ( Select CustomerNum, CustomerName, Planet, ChangeDate, RankNo From Staging.DimCustomer
Where RankNo = 1) AS Source
ON Target.CustomerNum = Source.CustomerNum
WHEN MATCHED AND EXISTS
(SELECT Source.CustomerNum, Source.CustomerName
SELECT Target.CustomerNum, Target.CustomerName)
UPDATE SET Target.CustomerName = Source.CustomerName;
There are just a few things to note:
As I mentioned in a previous article, the Type 1 Merge statement is performed last. Obviously, it has to be.
Notice that I’m using a simple While statement to loop.
The RankNo column was calculated by descending ChangeDate because this insures that the last record for each entity to be processed has a RankNo equal to 1. This makes the Type 1 Merge statement easier to write.
What about Transactions and Error Handling?
I’ve intentionally left those aspects out of this code for a couple of reasons. First, it’s easier to understand what’s happening without the extra code in the way. Second, you may not actually want explicit transactions in your Initial Data Warehouse load. It doesn’t take a Rocket Surgeon to figure out that if you load a huge Dimension historically, and you only use one transaction, your transaction log may grow out of control. This is particularly true if you are loading multiple Dimensions simultaneously. This is definitely an “It Depends” situation.
As I mentioned in the previous article, Extracting Historical Dimension Records using T-SQL, the Merge statement above should work in either Method 1 or 2. Also, because of the RankNo column, incremental Dimension loads which may contain more than 1 changed record per entity (CustomerNum) will work correctly as well.
This past weekend I made the trek to Oklahoma City to participate in their latest SQL Saturday. It was the first time I had spoken there, but it didn’t feel that way. Kristin Ferrier (Blog/Twitter), Jeremy Marx (Blog/Twitter), Matt Brimer (Twitter), Allen Smith (Blog/Twitter), and the rest of the team were so welcoming, just what you’d expect from our #SQLFamily.
The event went off without a hitch. It was very well run, the volunteers knew exactly what they were doing, and the venue was great as well. I gave two presentations, which were fairly well attended. I didn’t stop to count, but I think both had more than 40 people. Judging from the feedback evaluations, both groups had a good time and learned a lot. As a bonus, both of my sessions were recorded. User Group TV was kind enough to record my Merge presentation, and I recorded the Data Warehouse Mistakes presentation myself using Camtasia. I’m currently editing the second presentation and hope to post a link to it tomorrow.
The recordings are below, along with links to download the presentation materials.