Archives

Presenting Tonight for the SSIG in Greenville, SC

Bimlprecon
The fine folks at Varigence were kind enough to invite me to come and spend a couple of days with them at their headquarters in Greenville, NC. I’m excited to discuss Biml strategy with them, and how to best leverage and extend my BimlScript frameworks. To be honest, I can’t wait to spend some quality time with them to compare notes and pick up some new ideas.

I’m also going to present something new tonight for the SQL Server Innovators Guild (SSIG).

A CRUD-dy Alternative to Slowly Changing Dimension Processing

Slowly Changing Dimensions (SCD) are the foundation of Data Warehousing. Many of us use T-SQL Merge to process SCDs because the pattern is well known and it performs reasonably well. However, many experts in the SQL Server Community recommend that we stop using it, due to many outstanding bugs, but have not recommended a suitable T-SQL alternative.

In this session, I’ll show you how I use standard CRUD, mostly Inserts and Updates, to process Slowly Changing Dimensions. My current design pattern yields high performance, great flexibility, and easily maintainable code. As an added bonus, you’ll leave this presentation with the design pattern and code samples that you can quickly put to work in your environments.

Wednesday night I’m flying back home to Michigan for a few days to spend time with friends and family. It’s been too long, and I need to recharge my batteries.

Upcoming Biml Precon at SQL Saturday Phoenix

Bimlprecon
The good folks putting on SQL Saturday Phoenix have asked me to give a full day Precon seminar on Friday March 28, 2014. This will be my first Precon focused on Biml (Business Intelligence Markup Language), and I’m really pumped about it.

Fast Track your Business Intelligence Projects with Biml – Precon

You’re a business intelligence developer or manager trying to provide quality data in a timely, cost-effective manner. Perhaps you’ve considered Agile methodology, but found that the details of ETL, the plumbing if you will, makes iterative development all but impossible. In just one day, you’ll learn how to leverage Business Intelligence Markup Language (Biml) to super charge your ETL efforts, alleviating the pain and tedium of data warehouse development.
You will Learn:

  • How Biml makes Agile data warehouse development a reality.
  • Understand the basic syntax and rules of Biml, the important role of meta data, and how to choose the right scripting language for you.
  • Using BimlScript to dynamically create tables, views, and the SSIS packages to load them.
  • How to change your development mindset and spot re-usable patterns which are ideal for using Biml.
  • How to leverage BimlScript for ETL performance tuning.

As an added bonus, you’ll leave the precon with many BimlScripts and code samples you can adapt to use in your own environments. The entire seminar is geared toward getting you up to speed on BimlScript so you can stop wasting your time repeating the same old patterns. Come see why Biml is changing the way we build data warehouses.

Tickets can be purchased here. If you order by 2/28 the price is only $99 and is $125 after that. Keep in mind that there is a small processing fee as well.

SQL Saturday Phoenix Presentations

If you can’t attend the Precon, I’ll also be giving two presentations on Saturday which are free of charge (of course). If you’re in the area, come see me present:

Biml – The Next Step in Agile Data Warehousing

Business Intelligence Markup Language is literally changing the way we think about Data Warehouse design and ETL. In this interactive, demo focused presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use Biml Script to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.

Data Warehouse Mistakes You Can’t Afford to Make

Many data professionals understand the basics of Data Warehouse design, including Dimension and Fact Tables, slowly changing Dimensions, and the use of meaningless surrogate keys. However, it isn’t until you’ve created a dimensional model and put it into production, that you realize just how much of an impact seemingly trivial mistakes can make. They can hobble performance, allow inaccuracy, and perhaps worst of all, inhibit adoption and usage of the new system. Learn how to avoid many common mistakes, from someone who’s made them and then found ways to correct them.

More details about SQL Saturday Phoenix #279 can be found here. Hope to see a bunch of you there!

My Biml Metadata Strategy

Now that we’ve discussed various common metadata strategies used in creating BimlScript, I’m going to explain exactly how the metadata in my sample BimlScript framework functions. Basically, I’m going to start to pull back the curtain and explain some of the “magic” involved in my Biml – The Next Step in Agile Data Warehousing presentation.

As I mentioned in a previous article, I use two different forms of metadata in this example, SQL Server Extended Properties and Biml Annotation Tags. This metadata applies at both the table and column levels. While many examples and demonstrations work from the source forward, I’m a dimensional modeler so I tend to work from the data warehouse backward. My basic metadata process consists of the following steps.

  1. Metadata is defined and kept in a dimension model created with modeling software. Much of this metadata is specified automatically and requires very little effort from the architect/modeler.
  2. The modeling software generates the DDL necessary to convert the metadata into SQL Server extended properties.
  3. BimlScripts, which I’ll share in an upcoming article, convert the extended properties into Biml Annotation Tags. The other BimlScripts in the framework works from those tags. The import scripts will work fine with BidsHelper and do not require a Mist license.

There are several reasons I am choosing this multiple step process.

  1. Complex data warehouse projects typically require modeling software.
  2. I needed to decouple that software from my framework wherever possible so I wasn’t locked into using one product.
  3. Extended properties are already widely used for documentation.
  4. Biml Annotation Tags make writing BimlScript a lot easier, and the code is easier understand and maintain as well.
  5. If more people standardize on Biml Annotation Tags for their metadata, shared BimlScript snippets become much easier to adapt to multiple projects.
  6. This solution provides maximum flexibility since the rest of my code uses Biml Tags. For example, I could forego the use of a dedicated modeling and handle simpler projects entirely in Mist, or even manually with BidsHelper.

The modeling software I’m using for this demo is ModelRight. I don’t work for them, and don’t profit in any way if you purchase their software. That being said, I think it’s a good value compared to the other modeling products on the market. Anyway, the specific modeling software used isn’t integral to this solution. Any modeling tool which allows you to create custom metadata and generate the corresponding SQL Server extended properties DDL code will work. In fact, I have a friend who is using my framework the same way with ERStudio right now. Before we dive into the metadata details, let’s briefly discuss what we are going to load.

ModelRight Dimensional Model

ModelRight Dimensional Model

In the simple BimlScript framework example that I’m sharing, we’ll load the 4 tables shown above.

  • DimCustomer - This will be a Dimension with only Type 1 columns.
  • DimProduct - This Dimension has both Type 1 and Type 2 columns.
  • DimProductCategory - This is a Junk Dimension and also a Role Playing Dimension. In order to keep this example simple, I’ve created a contrived example and used DimProductCategory as PrimaryProductCategory and SecondaryProductCategory. Again, this is not a real world example, but it makes for a simple demonstration.
  • FactSalesOrderDetails - A Fact table with foreign keys to each of the three Dimensions listed above.

As I mentioned in the previous metadata post, I tend to favor Convention over Configuration. In order to produce a framework that was powerful and flexible, yet at the same time easy to understand, I analyzed exactly what my BimlScript code would need to “know” in order to load Dimension and Fact tables. I distilled the metadata requirements down to only three table level items and four column level items. Let’s start with what one would need to know at the table level. Notice that I do abbreviate my metadata titles which makes writing BimlScript faster, and the users never see this information anyway.

ModelRight Table  User Defined Properties

ModelRight Table User Defined Properties

Table Level Metadata
  1. TableRole - This property contains the type or role of the table and therefore includes values such as Dimension, DimJunk (Junk Dimension), Fact, StaticLkp (Lookup), and Custom.
  2. LkpNoMatch - This stands for “Lookup No Match” and this defines the type of behavior necessary to handle any SSIS Lookup against this table which fails to find a match. The current choices are Unkown, Inferred, and None. I’ll go into more detail in a future article when I explain how the Fact table BimlScript works.
  3. IsScripted - This setting determines if my BimlScript will automatically execute on this particular table or not. For example, I may have already created a custom SSIS package to load a particular table and I don’t want my BimlScript over-writing it.
ModelRight Column User Defined Properties

ModelRight Column User Defined Properties

Column Level Metadata
  1. ColumnRole - This column, similar to the TableRole above, classifies the functionality of each column in the data warehouse. Common values are BK (Business Key), SK (Surrogate Key), Type 1, Type 2, etc.
  2. IsViewColumn – I typically create views which overlay the Dimension and Fact tables in my data warehouse. This column, which accepts a simple “Y” or “N” allows the designer to indicate whether this column should be included in any views. This setting has the same effect when I use BimlScript to create Role Playing Dimensions.
  3. IsLkpColumn - This indicates whether the column value is the result of a SSIS Surrogate Key Lookup and the allowed values are “Y” or “N”. Again, I’ll provide more details on this when I cover Fact table loading.
  4. LkpRolePlayingPrefix - When creating a Role Playing Dimension, I prefix the names of all of the columns, including the Surrogate Key with a word(s) that represents the Dimension. So, if I created a Role Playing Dimension (view) on a table called DimDate which represents a “ShipDate”, then each column would be prefixed by the word “Ship”. The word ship is the LkpRolePlayingPrefix. If the value for this piece of metadata is NULL, or is missing entirely for a column, then no prefix is applied. Again, this will make more sense later.

It’s kind of amazing so much can be accomplished with so little metadata, much of which is automatically created by my modeling tool. In the next several articles I’ll use it to create staging tables, Role Playing Dimensions and other views, and the SSIS packages necessary to load Junk Dimensions, Slowly Changing Dimensions, and Fact tables. Keep in mind that this is only one of several BimlScript frameworks I’ve developed, and is the simplest to use and understand. My more elaborate BimlScript frameworks require more extensive metadata.

In the next article, I’ll walk through the steps to creating a simple, functional BimlScript which loads the Junk Dimension above, DimProductCategory. The SQL Server DDL and Biml file will be available for download in that article. However, to end this one, I thought I would share a snippet of code from each step listed above. Notice that each snippet includes some metadata at both table and column level. This is a portion of the SQL Server extended property DDL produced automatically by ModelRight.

EXEC sp_addextendedproperty @name = 'TableRole', @value = 'DimJunk',
	@level0type = 'Schema', @level0name = 'dbo',
	@level1type = 'Table', @level1name = 'DimProductCategory';
GO
 
EXEC sp_addextendedproperty @name = 'IsScripted', @value = 'Y',
	@level0type = 'Schema', @level0name = 'dbo',
	@level1type = 'Table', @level1name = 'DimProductCategory';
GO
 
EXEC sp_addextendedproperty @name = 'ColumnRole', @value = 'SK',
	@level0type = 'Schema', @level0name = 'dbo',
	@level1type = 'Table', @level1name = 'DimProductCategory',
	@level2type = 'Column', @level2name = 'ProductCategoryKey';
GO
 
EXEC sp_addextendedproperty @name = 'IsViewColumn', @value = 'Y',
	@level0type = 'Schema', @level0name = 'dbo',
	@level1type = 'Table', @level1name = 'DimProductCategory',
	@level2type = 'Column', @level2name = 'ProductCategoryKey';
GO

My BimlScript automatically imports the database DDL and converts it to Biml Annotation Tags like the following:

        <Dimension Name="DimProductCategory" SchemaName="DataWarehouse.dbo" PackageSubpath="ImportTables">
            <Annotations>
                <Annotation Tag="IsScripted">Y</Annotation>
                <Annotation Tag="LkpNoMatch">Unknown</Annotation>
                <Annotation Tag="TableRole">DimJunk</Annotation>
            </Annotations>
            <Columns>
                <Column Name="ProductCategoryKey" DataType="Int16" ScdType="Key">
                    <Annotations>
                        <Annotation Tag="ColumnRole">SK</Annotation>
                        <Annotation Tag="IsViewColumn">Y</Annotation>
                    </Annotations>
                </Column>

Strategies and Languages for Generating Biml

What are the best methods for generating Biml and what language should you use? Before I dive into this, let me begin with a disclaimer. I applaud anyone working with Biml to improve their productivity and eliminate needless, repetitive work. One of the primary advantages of Biml is its flexibility, which allows people to generate it many different ways to solve problems. I’m going to briefly discuss some concepts involved in the process and common ways I see it being generated. I’m also going to make recommendations that I’ve found work well in most circumstances. However, remember that this is only my opinion, and your mileage may vary. Let’s start by discussing two basic strategies for creating Biml, which I refer to as Scripting Biml vs. BimlScript.

Scripting Biml?

What exactly do I mean by “Scripting Biml?” Biml is simply an XML Language and therefore can be produced with many different tools and languages. When I talk about “Scripting Biml,” I’m referring to using some kind of tool or language to generate static Biml from outside of the “text” itself. This could take the form of XML language generation or simple string manipulation. I myself started down this route because string manipulation was familiar and comfortable to me. However, I quickly found that this method is very limiting and becomes unwieldy when attempting any complex scripting. I’ve listed several “Scripting Biml” examples, and included links for further research as well as a brief explanation of the methods/strategies being leveraged.

PowerShell

At PASS Summit 2013, Allen White (Blog/Twitter) presented Automate Your ETL Infrastructure with SSIS and PowerShell, which is available for streaming at that link. Allen demonstrates how to use PowerShell and Server Management Objects (SMO) to generate Biml. Be forewarned, unless you’re really into PowerShell, this video may not be for the faint of heart.

.Net Language – C#

One of my clients created an open source project called BimlGen, which is available on GitHub. BimlGen imports SQL Server database tables into Biml, and converts SQL Server Extended Properties into Biml Annotation Tags. Admittedly, it needs work, but it’s free and open sourced. I’m hoping that folks in the community will work on it to improve and extend its functionality. Simply enter the Server and Database Names, change the output folder if you like, and click Generate. The output folder will open and you should find a single Biml file which contains your Biml assets. BimlGen_screen

T-SQL

Another friend of mine, Garrett Edmondson (Blog), has a video on his blog where he uses T-SQL string concatenation to generate simple SSIS Packages.

BimlScript

BimlScript is created by embedding C# or VB.Net code within the Biml itself. Once you grasp the basics of BimlScript, which admittedly is not a trivial undertaking, your productivity will skyrocket. BimlScript is analogous to HTML, which is also a markup language, and ASP.Net. BimlScript allows the programmer to embed nuggets of C# or VB.Net code inside Biml to dynamically generate Biml assets, much the same way that ASP.Net dynamically generates web pages. These nuggets often include LINQ (Language Integrated Query) which is compatible with either C# or VB.Net. If all of this is new to you, don’t worry, I’ll explain all of this in more detail in a later post.

Should I use C# or VB.Net?

Let me preface this by saying that I started coding BASIC long ago on a TRS-80 machine, and I’ve been coding in BASIC ever since. However, when I started down the BimlScript path, I drank the C# Kool-Aid for the following reasons:

  • The vast majority of the BimlScript samples available are written in C#. This fact alone was enough to get me to learn C#.
  • C# with LINQ just seems cleaner once you get over C#’s affinity for semi-colons and curly braces. However, those C# curly braces tend to stand out better than VB statements embedded inside Biml.
  • Mist doesn’t support Intellisense for VB.Net yet, but that functionality will be added soon.
  • Arguably, C# programming skills are in more demand and worth more money than VB.Net.

That being said, if I had a team of skilled VB.Net programmers, I would not invest the resources to teach them C# just to work with BimlScript.

Types of Biml Scripts

There are three basic types of BimlScripts which support both C# and VB.Net.

  1. Expandable – This is the most common type and is the only type supported by BidsHelper. These types of scripts are executed “manually” by the user.
  2. Live – If you choose to purchase Mist, Biml Scripts can be automatically run in the background, primarily for convenience. One of the nice things about Mist is I can use the same script as either Expandable or Live at my choosing.
  3. Transformers – These scripts change existing objects like SSIS Packages according to code logic and metadata. This is a Mist only option as well.

The vast majority of the scripts I share will be expandable and will use C#/LINQ. In the next post, I’ll explain exactly how I use metadata to control my BimlScript framework sample.

Biml Metadata Options and Considerations

Simply put, metadata is data about data. Probably the most common way that SQL Server Professionals use metadata is when we query SQL Server System Views. In this article we’ll discuss using metadata to drive BimlScript automation. Before I start listing common metadata sources and methods, I need to briefly contrast two basic metadata philosophies I see being used. In some BimlScript frameworks, the metadata is highly detailed, as this typically allows the most flexible, configurable functionality. However, maintaining that detailed metadata can be tedious and time consuming. I subscribe to the philosophy of Convention over configuration. I prefer to code BimlScript in such a way as to expect that default values and behaviors (conventions) will be used and supplied. This leads to BimlScript that is much cleaner, easier to read, and maintain. At the same time, I try to provide mechanisms to allow as much flexibility as possible. For example, my BimlScript code follows default patterns I’ve identified and only deviates if specific metadata values are found.

There are several BimlScript metadata options/sources that I’ve noticed being used. Below I’ve listed the most common and I’ve included hyperlinks to blog posts or code samples where possible. I’ve noticed that more people are presenting on Biml than are blogging, but I’m not sure why.

SQL Server Metadata Tables

Simple Column Level Metadata Table

Simple Column Level Metadata Table

What you’ll often see in demonstrations are BimlScripts being controlled by metadata stored in dedicated SQL Server tables. Unfortunately, I couldn’t find any examples on the interwebs, so I provided the sample above. Information in the ColType field (abbreviated to fit on PowerPoint slides) indicates the functional role of said column. A column designated as BK would be treated as a Business Key, LKP triggers SSIS Lookup Transform code generation, etc. I’ve seen people use separate tables to house the metadata at the column level for Facts and Dimensions. I’ve also seen Table level metadata kept in one table and Column level metadata kept in a separate table. These metadata tables often look like Source to Target Mapping Documents. Creating and using SQL Server tables is second nature to most of us so this method is often used. These tables can be backed up with the rest of the database and also used to generate documentation for business purposes.

SQL Server Database Schema (DDL)

This takes several forms. For example, Samuel Vanga (Blog/Twitter) shows how to generate simple SSIS Packages dynamically using query results from the sys.tables Catalog View.

Some metadata strategies rely on Table and Column Names, though I couldn’t find examples on the web. For example, BimlScript can look for table names that begin with “Dim” or “Fact,” and take appropriate action. Also, I’ve seen people specify that any Column Name which ends in “Key” represents a Surrogate Key, or a column name that ends in “_BK” represents a Business (or Alternate) Key. I tend to avoid this practice because I’m a consultant and I work on many data warehouse projects. Therefore, I can’t dictate things like naming conventions. Also, I wouldn’t want business users to see column names like “CustomerCode_BK” on their reports.

SQL Server Extended Properties

Extended Properties have been around since SQL Server 2000 and people have been using them to document their database objects for some time. As a source for BimlScript metadata, Extended Properties have several advantages such as:

  • They’re supported on most SQL Server objects.
  • Can be automatically generated and maintained by third party software.
  • Added and retrieved with relatively simple T-SQL code.
  • They’re “hidden” from regular users.
  • They’re backed up with the rest of the database.

I personally tend use SQL Server Extended Properties in conjunction with the next one, which is….

Biml Object Model – Annotation Tags

Like SQL Server Extended Properties, most (if not all) Biml objects such as tables, views, columns, etc. can be documented via annotations. There are several types of Annotations, in this article we’re going to focus on the “Tag” type. Below is sample of Biml code with Annotations that drive ETL automation.

 <Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Dimensions>
        <Dimension Name="DimCustomer" SchemaName="DataWarehouse.dbo" PackageSubpath="ImportTables">
            <Annotations>
                <Annotation Tag="TableRole">Dimension</Annotation>
                <Annotation Tag="LkpNoMatch">Unknown</Annotation>
                <Annotation Tag="IsScripted">Y</Annotation>
            </Annotations>
            <Columns>
                <Column Name="CustomerKey" ScdType="Key">
                    <Annotations>
                        <Annotation Tag="ColumnRole">SK</Annotation>
                        <Annotation Tag="IsViewColumn">Y</Annotation>
                    </Annotations>
                </Column>

Notice that I’ve specified a Dimension Table named DimCustomer which I’ve designated with a TableRole of Dimension. You’ll also notice that the first column, “CustomerKey” has a ColumnRole of “SK” which stands for Surrogate Key. These designations are only meaningful to the BimlScript you’re using. I’ll go into the nuts and bolts of how you refer to these annotation tags with BimlScript early next week.

What Metadata is Required?

As with so many other technical questions, the answer is “It Depends.” Do you lean more towards Configuration or Convention? As I mentioned above, I prefer Convention which should be obvious in the above snippet. Notice how any references to the source(s) seem to be missing? The BimlScript Framework I’ve recently demonstrated, which is available online, operates solely from the metadata in three table level annotations and three column level annotations. From that little bit of metadata, I “automatically” generate staging tables, overlaying views on the data warehouse including Role Playing Dimensions, Slowly Changing Dimension loads, Junk Dimension loads, and Fact table loads. Neat huh?

In the next article, I’ll take on the thorny topic of choosing a method and/or programming languages to generate Biml.

Agile Data Warehouse Development using Biml

Agile development is good, Waterfall is bad. We hear this all the time, along with un-documented statistics claiming that approximately 80% of Data Warehouse projects fail. So, we as Business Intelligence Professionals are told that we must be Agile. Whenever I taught Dimensional Modeling and Data Warehouse classes, I could count on at least one student asking me how to apply Agile principles to this process. My answer, up until several months ago, was that true Agile Data Warehouse development was a pipe dream. The reason why is fairly obvious if you follow the Kimball method. In a nutshell, when building Star or Snowflake Schemas, one should design, build, and load related Dimensions before Fact Tables. Since each Fact Table, which contains your metrics, typically relates to a majority of the Dimensions, the first Fact Table load requires a great deal of time from the Data Warehouse Development Team and business stakeholders. Dimension tables require a lot more time and resources to develop than Facts. After that first “wave” if you’ll pardon the pun, additional Fact tables and metrics can be delivered quite rapidly. The natural consequence of this is a form of Waterfall development.

We’ve always tried to ameliorate this problem by prioritizing business processes by their “Business Value” and feasibility. In other words, we typically chose to create our first Fact Table(s) for business processes that were worth a lot of money and were the easiest to build and load. We traded a true Waterfall, delivering the entire solution at the end, for a partial Waterfall. Less painful? Sure, but that doesn’t equate to Agile development.

Agile Data Warehouse Design

Several months ago Tim Costello (Blog/Twitter), a good friend of mine from the community, urged me to read the following book:

Buy This Book

Buy This Book

Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

I have a great deal of respect for Tim, so I read it and the book is excellent. It offers comprehensive tools and techniques for improving the entire process of designing a data warehouse. The book emphasizes quick iterations, each of which involves:

  1. ModelStorming – Their new method of conducting business process interviews.
  2. JIT (Just In Time) Dimensional Modeling
  3. ETL Development
  4. Visual BI Delivery
  5. Revisions

These ideas and concepts are great and they address several issues with the traditional Data Warehouse Lifecycle. However, up until recently it was impossible to build a data warehouse this way.

Agile Data Warehouse Design Development?

In an ideal world, building a data warehouse would be an Agile process. The first sprint would deliver one or more Fact Tables, and possibly a couple of prototype or primitive Dimensions. In each successive sprint, more Fact Tables could be deployed, new Dimension tables could be added, and existing Dimensions could have additional attributes created. Existing tables would easily “adapt” to the changes. For example, existing Fact Tables would automatically be related to new Dimension tables, if appropriate. Ideally, this whole process would be fluid and quick. You know umm… Agile. However, anyone who develops ETL processes knows this is not the case.

Let’s consider the work involved in making just one small minor change such as adding a new attribute to an already existing Dimension. The process may involve the following steps.

  1. Alter the Dimensional Model if using a dedicated tool.
  2. Alter the Source Query(ies).
  3. Any dependent objects need to be altered as well such as staging tables, stored procedures, user views, etc.
  4. T-SQL must be written or generated to make these changes.
  5. The SSIS Package(s) needs to be edited.
  6. If SSAS Cubes are being used they also need to be edited.

In traditional ETL development, none of these objects above react dynamically to the change, they all have to be handled separately. This is only scratching the surface. Best practices suggest using Source Control on all of the objects above, which is very time consuming. Also all of these changes should be fully tested in a separate environment before being deployed to production. So, this single small incremental improvement could require several days (perhaps weeks) of development work. So, why did I preface this with section with “until recently?”

Biml Facilitates Agile Data Warehousing

“BimlScript enables developers to build and load a Data Warehouse nearly as fast as they can draw the boxes and lines.”

Using a quality BimlScript framework eliminates a lot of the pain, tedium, and costs involved in Data Warehouse Development and ETL. How would the same change from above be accomplished using a good BimlScript framework?

  1. Alter the Dimensional Model if using a dedicated tool.
  2. Alter the Source Query(ies).
  3. Automatically import the changes into Biml.
  4. Automatically regenerate every single object listed above including staging tables, user views, SSIS Packages, SSAS Cubes, etc.

Well, what about Source Control? If the BimlScript framework automatically generates all of those objects above from the Dimensional Model and its metadata, what really needs to be checked into Source Control? The Dimensional Model and the BimlScripts. Since the only object that was altered during this “sprint” was the Dimensional Model itself, that’s the only object which needs to be checked in and out of Source Control.

What about testing? If the results of your BimlScripts have been tested in prior sprints, and you didn’t change any of that code, how much QA testing is really required? I would argue that only a cursory amount of testing would be required to validate your metadata.

In short, BimlScript is the perfect companion to the Agile design methods proposed in that book. Stay tuned to the blog for more details.

Biml Syntax Basics and Rules

Biml is a human readable XML language, created by a company called Varigence, which compiles into SQL Server objects such as tables, views, SSIS packages, and SSAS cubes. In this post, we’ll cover the basics of the Biml language so you can start coding as fast as possible. Since, I’ve always found that it’s easier to learn new things by example, I will use one of the scripts from my Biml – The Next Step in Agile Data Warehousing presentation.

Biml is not proprietary and anyone can create software which generates and/or consumes it. There are many ways to generate Biml, but most people currently use one of two methods. For those who don’t want to pay for a dedicated product, Biml (and BimlScript) can be written in the Business Intelligence Development Studio (BIDS) or SQL Server Data Tools (SSDT) and compiled with a free program called BidsHelper. Most ETL Developers have at least heard of BidsHelper, and I’ve been using it for years. Instructions on installing BidsHelper and creating your first simple Biml file can be found here.

The other popular method is to use Mist, which is a product of Varigence. Let me preface what I’m about to say with a disclaimer.

I do not work for Varigence and am not financially compensated by them. I don’t benefit in any way if you purchase Mist or choose to use BidsHelper instead.

That being said, I find Mist to be invaluable for serious BimlScript development. Most of my scripts are created using Mist, but they can all be edited, compiled, and executed with BidsHelper. In the future, if I provide a script which cannot be run without Mist, I will clearly point that out. The basic development process when using BidsHelper varies from Mist in several ways. One way is file naming and grouping. If using Mist to create or import tables, connections, and other objects, a separate file is usually created for each. In Mist, BimlScripts can be manually executed or they can be “live.” If they are live, the objects are sort of created virtually and updated in real time. Since Mist keeps track of these live objects, it doesn’t matter if each Connection, Database, Schema, etc. is in a separate file.

In BidsHelper, we tend to group similar Biml objects together because there is no live option when using that program. Therefore, any files containing objects must be pre-selected before execution. So, if your connections, databases, and schemas were spread across 8 files, you’d have to constantly pre-select them whenever you ran other BimlScripts. Obviously, the fewer Biml files we need to highlight, the faster we can work. To that end, I and others typically start by creating a standard file called Environment.biml. This file contains my Connections, Databases, and Schemas.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<!--See More BimlScripts at Made2Mentor.com -->
    <Connections>
        <OleDbConnection Name="connDW" ConnectionString="Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=DataWarehouse;Data Source=DevServer" />
        <OleDbConnection Name="connSource" ConnectionString="Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=m2m601c61;Data Source=DevServer" />
    </Connections>
    <Databases>
        <Database Name="DataWarehouse" ConnectionName="connDW" />
        <Database Name="M2MSource" ConnectionName="connSource" />
    </Databases>
    <Schemas>
        <Schema Name="dbo" DatabaseName="M2MSource" />
        <Schema Name="dbo" DatabaseName="DataWarehouse" />
        <Schema Name="interim" DatabaseName="DataWarehouse" />
        <Schema Name="star" DatabaseName="DataWarehouse" />
    </Schemas>
</Biml>

Note, that while I use the same Environment file in both BidsHelper and Mist, when using Mist the Project View shows one file, while the Logical View properly segregates them.

Mist

Mist

Root Element

Let’s break down the basic parts and rules of a Biml XML file. The first line is called the Root element and references the XSD file. This is the XML Schema Definition file which contains the rules of the Biml declarative XML language. Under the root elements are plural collections of elements/objects such as Connections, Databases, Schemas, Packages, etc. Each plural collection will have one or more individual objects nested inside, which you can see in the above script. Also, each plural collection and each nested Biml object must have a closing tag.

Object Attributes/Properties

Each type of Biml object has different properties that pertain to it, only some of which are required. In my example, you’ll notice that I’ve defined two databases, each of which has two sources, along with several schemas. One of the schemas I’ve defined is named “star.” That’s a little inside joke because I recently worked on a project where all of their dimension and fact tables were placed inside a “star” schema.

All Biml objects including the plural collections are case sensitive. Varigence uses Pascal Case in which words are concatenated and each new word begins with a capital letter, like OleDbConnection. Biml attribute values (properties) must be placed inside quotes and are case sensitive as well. Therefore, a schema named “Star” is not the same as one named “star”.

Biml Comment Syntax

Biml Comments are created using the syntax I’ve show above. Keep in mind that XML Comments can span more than one line. The following is perfectly valid.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<!--See More 
	BimlScripts 
	at Made2Mentor.com  -->    
</Biml>

Object Naming and Scope

Notice the way that objects are named in the above example. Obviously schema names cannot be completely unique otherwise you could only have one database in your Biml project with the schema “dbo.” However, objects must be unique within their scope. In this case, my schema names must be unique within the Database scope.

Illegal XML Characters and the use of CDATA

Remember that Biml is simply XML and as such has certain characters that have special meaning are considered illegal. Some common characters which can cause trouble are the comparison characters which are often part of T-SQL or SSIS Expression statements. If you’d like you can replace the following characters with their corresponding pre-defined Entity Reference:
illegal chars

For example, an Execute SQL Task with a Direct Input query like the following will result in an error.

Select  Fields From  SomeTable Where SomeField < 5

This can be re-written as:

Select  Fields From  SomeTable Where SomeField &lt; 5

However, I find that this makes code difficult to read and follow, so I typically enclose this kind of code inside a CDATA section which tells XML not to parse it. CDATA can span multiple lines as well. Since most of my Biml is generated, including the CDATA section is trivial.

<![CDATA[ Select  Fields From  SomeTable Where SomeField < 5 ]]>  

White-space Issues

White-space can be inserted in your code to make your code easier to read and the compiler ignores it. However, white-space is NOT ignored in your property values. Consider part of the script above.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Databases>
        <Database Name="DataWarehouse" ConnectionName="connDW" />
    </Databases>
</Biml>

I can add extra spaces or carriage returns to the Biml objects themselves or between properties and the compiler ignores them. So, the following would work:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Databases       >
        <Database Name="DataWarehouse"        ConnectionName="connDW" />

             </Databases>
</Biml>

I realize that the code I just posted is hard to read, but you get the point. Conversely, white-space is NOT ignored inside of proprty values. So, the following will throw an error:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Databases>
        <Database Name="DataWarehouse" ConnectionName="connDW      " />
    </Databases>
</Biml>

Could not resolve reference to ‘connDW ‘ of type ‘DBConnection’. ‘ConnectionName=”connDW “‘ is invalid.

What’s Next?

Stay tuned because in upcoming articles we’ll cover the following:

  • A tutorial on BimlScript syntax and how to create them.
  • Additional Code Samples from my Biml – The Next Step in Agile Data Warehousing presentation, along with an explanation of how each code works.
  • Everything you need to know about Biml datatypes.

Any questions?

Announcing Two Biml Presentations Next Week

My enthusiasm for Biml must be catching because I’ve been asked to give two presentations next week.

I’ll be giving a brand new presentation on Tuesday, November 12th, at 4:30pm for the MSBIC group at the Microsoft Building in Irving, TX. This presentation is targeted towards Business Analysts, management, and other members of the data warehouse team. If your boss has been on the fence about adopting Biml, try to get him/her to see this presentation. Anyway, details can be found on the MSBIC site and the abstract is as follows:

The Business Case For Biml

Business Intelligence Markup Language is literally changing the way we think about Data Warehouse Design and ETL Development, but what impact will this have on the rest of the business? In this presentation we’ll discuss how Biml adoption may affect things like the Data Warehouse Development Lifecycle, Team Dynamics and Roles, Quality Assurance Testing, and Business Process Interviews. We will also discuss how to maximize your return on investment (ROI) by utilizing BimlScript in your data warehouse projects.

Next Thursday, November 14th, at 6pm CT I’ll be giving a remote presentation to the Tulsa SQL Server User Group. I’m going to present a re-factored and more detailed version of the Biml presentation I gave at SQL Saturday #255 last week. I’m hoping that it will be available for live streaming and I’ll follow up with more details once everything is straightened out. I’m thrilled to be giving this presentation again because I will have more time and can show more of the code. Once again, the abstract follows.

BIML – The Next Step in Agile Data Warehousing

Business Intelligence Markup Language (BIML) is literally changing the way we think about Data Warehouse design and ETL. In this interactive demo-focused presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use BIML Script to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.

But wait, there’s more….

I mentioned my intention to present a full day Biml Workshop in a previous article, and a local firm has approached me and would like to sponsor it. I know a lot of my NTSSUG SQL Family are really eager for some Biml training and I promise to post any developments and updates right here on the blog.

Hope to “see” some of you at the presentations.

Thank You to My NTSSUG SQL Family

Dave Ryan Vic
I’m happy to announce that I’ve just been elected to the North Texas SQL Server User Group Board of Directors. I owe so much to the PASS Community in general, but even more to our local group. In case you aren’t aware, the NTSSUG is one of the most active and successful groups in the entire country, and I’m very honored to join the board.

I want to just take a moment to acknowledge what our group has already achieved.

  • We’ve had five successful SQL Saturday events. Our last, SQL Saturday #255 had more than 500 attendees.
  • Hosted SQL Rally in Dallas.
  • We routinely draw 60 to 80 participants to our monthly meetings.

I’d like to take a moment to thank the two people who also ran for the board this round, Miyaka Fusi (Twitter) and Chris Utter. Being a board member is a serious commitment, and I hope to continue to see you folks at meetings and events.

Also, I’d like to take a moment to thank our outgoing two board members as well. Erin Welker (Blog/Twitter) will be retiring as our Past President and she’s served since the group’s inception more than 10 years ago. Tim Mitchell (Blog/Twitter) has been our Director of Finance for the past four years and his contributions have been vital to the growth of the group.

The current board is a very cohesive team made up of talented and dedicated professionals. Great guys one and all.

  1. Sri Sridharan (Blog/Twitter) – President
  2. Ryan Adams (Blog/Twitter) – Director of Programs
  3. Ganesh Gopalakrishnan (Twitter) – Director of Logistics and Communication

I officially become a board member on January 1st, and I promise to do my very best to measure up to the standards set by my fellow board members as well as those that served before me.

Wish me luck!

Biml Makes You THE ONE

The Matrix is one of my favorite movies. The story, special affects, and the theme are so compelling that I’ve watched the movie at least a dozen times. I referenced it in the introduction for my SQL Saturday 255 presentation, Biml – The Next Step in Agile Data Warehousing, and it fits in perfectly with my ongoing series, Why I Choose Biml Scripting as well. How does Biml relate to the Matrix?

The Story

What is the Matrix really about? It’s about this really smart guy, who’s disillusioned with his surroundings and life. Everything seems flat and boring, and he can sense that there’s something fundamental being hidden from him. He goes to work everyday, doing the same things, but is spending his nights on his computer searching for the secrets of The Matrix.
neo_bored
This is a person who seeks more. He seeks to gain a true understanding of his environment, his role in it, and eventually to gain mastery of it and his fate. After enduring the process of searching for a guide, learning what the Matrix is, and lots of Kung Fu and gun battles, he has an epiphany.

Neo Sees the Source Code

Neo is The One

Neo is The One


He realizes that he is in fact “The One.” He can see the underlying code of The Matrix and can use his gift to make his world into what he wants it to be. He doesn’t have to toil in tedium for the rest of his life.

Can I Learn to Dodge Bullets?

Since Biml compiles into SQL Server Objects like views, tables, SSIS packages, and SSAS cubes, you can learn to automate their creation and functionality with BimlScript. Essentially, you gain the ability to “see the code,” and all of the resulting power that conveys. As I demonstrated on Saturday, you can create BimlScripts which will automatically create and load a data warehouse based upon metadata.
neo_bullets_2
Can you learn to dodge or stop bullets? Well, no. However, if you spend time mastering BimlScript, you can build and load a data warehouse nearly as fast as you can draw the boxes and lines. You can save hundreds of hours of tedium, deliver data warehouses in rapid iterations, and save companies untold thousands of dollars.

Will You Take the Blue or Red Pill?

red-pill-blue-pill
So, will you take the Blue pill and go on about your everyday life? Are you content to build and repeatedly edit the same old SSIS packages again and again? Or will you take the Red Pill and learn BimlScript instead? Will you learn how to automate your data warehouse loads, create new SSIS Packages, SSAS Cubes, and other SQL Server objects at will? The choice is yours.

As I closed my Biml presentation, I mentioned that I would be posting the BimlScripts I wrote for it on the blog for download. I’m also going to include articles detailing how the code works and how to write similar code yourself. Stay tuned in the coming days and weeks for those articles. Also, I am working on scheduling some full-day BimlScript Workshops. I’m currently trying to find a good venue in Dallas/Fort Worth, and may have already secured one in Austin, TX. Finding the right venues will be critical as I’d like to make the training as affordable as possible. If you would like to attend a workshop, know of a good venue, etc. feel free to contact me at David@Made2Mentor.com.

Continue to stay tuned to the blog. I’ll be posting more Biml articles, announcing workshops, and other cool stuff in the coming weeks.

Page 1 of 3112345...102030...Last »