Archives

Biml Metadata Options and Considerations

Simply put, metadata is data about data. Probably the most common way that SQL Server Professionals use metadata is when we query SQL Server System Views. In this article we’ll discuss using metadata to drive BimlScript automation. Before I start listing common metadata sources and methods, I need to briefly contrast two basic metadata philosophies I see being used. In some BimlScript frameworks, the metadata is highly detailed, as this typically allows the most flexible, configurable functionality. However, maintaining that detailed metadata can be tedious and time consuming. I subscribe to the philosophy of Convention over configuration. I prefer to code BimlScript in such a way as to expect that default values and behaviors (conventions) will be used and supplied. This leads to BimlScript that is much cleaner, easier to read, and maintain. At the same time, I try to provide mechanisms to allow as much flexibility as possible. For example, my BimlScript code follows default patterns I’ve identified and only deviates if specific metadata values are found.

There are several BimlScript metadata options/sources that I’ve noticed being used. Below I’ve listed the most common and I’ve included hyperlinks to blog posts or code samples where possible. I’ve noticed that more people are presenting on Biml than are blogging, but I’m not sure why.

SQL Server Metadata Tables

Simple Column Level Metadata Table

Simple Column Level Metadata Table

What you’ll often see in demonstrations are BimlScripts being controlled by metadata stored in dedicated SQL Server tables. Unfortunately, I couldn’t find any examples on the interwebs, so I provided the sample above. Information in the ColType field (abbreviated to fit on PowerPoint slides) indicates the functional role of said column. A column designated as BK would be treated as a Business Key, LKP triggers SSIS Lookup Transform code generation, etc. I’ve seen people use separate tables to house the metadata at the column level for Facts and Dimensions. I’ve also seen Table level metadata kept in one table and Column level metadata kept in a separate table. These metadata tables often look like Source to Target Mapping Documents. Creating and using SQL Server tables is second nature to most of us so this method is often used. These tables can be backed up with the rest of the database and also used to generate documentation for business purposes.

SQL Server Database Schema (DDL)

This takes several forms. For example, Samuel Vanga (Blog/Twitter) shows how to generate simple SSIS Packages dynamically using query results from the sys.tables Catalog View.

Some metadata strategies rely on Table and Column Names, though I couldn’t find examples on the web. For example, BimlScript can look for table names that begin with “Dim” or “Fact,” and take appropriate action. Also, I’ve seen people specify that any Column Name which ends in “Key” represents a Surrogate Key, or a column name that ends in “_BK” represents a Business (or Alternate) Key. I tend to avoid this practice because I’m a consultant and I work on many data warehouse projects. Therefore, I can’t dictate things like naming conventions. Also, I wouldn’t want business users to see column names like “CustomerCode_BK” on their reports.

SQL Server Extended Properties

Extended Properties have been around since SQL Server 2000 and people have been using them to document their database objects for some time. As a source for BimlScript metadata, Extended Properties have several advantages such as:

  • They’re supported on most SQL Server objects.
  • Can be automatically generated and maintained by third party software.
  • Added and retrieved with relatively simple T-SQL code.
  • They’re “hidden” from regular users.
  • They’re backed up with the rest of the database.

I personally tend use SQL Server Extended Properties in conjunction with the next one, which is….

Biml Object Model – Annotation Tags

Like SQL Server Extended Properties, most (if not all) Biml objects such as tables, views, columns, etc. can be documented via annotations. There are several types of Annotations, in this article we’re going to focus on the “Tag” type. Below is sample of Biml code with Annotations that drive ETL automation.

 <Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Dimensions>
        <Dimension Name="DimCustomer" SchemaName="DataWarehouse.dbo" PackageSubpath="ImportTables">
            <Annotations>
                <Annotation Tag="TableRole">Dimension</Annotation>
                <Annotation Tag="LkpNoMatch">Unknown</Annotation>
                <Annotation Tag="IsScripted">Y</Annotation>
            </Annotations>
            <Columns>
                <Column Name="CustomerKey" ScdType="Key">
                    <Annotations>
                        <Annotation Tag="ColumnRole">SK</Annotation>
                        <Annotation Tag="IsViewColumn">Y</Annotation>
                    </Annotations>
                </Column>

Notice that I’ve specified a Dimension Table named DimCustomer which I’ve designated with a TableRole of Dimension. You’ll also notice that the first column, “CustomerKey” has a ColumnRole of “SK” which stands for Surrogate Key. These designations are only meaningful to the BimlScript you’re using. I’ll go into the nuts and bolts of how you refer to these annotation tags with BimlScript early next week.

What Metadata is Required?

As with so many other technical questions, the answer is “It Depends.” Do you lean more towards Configuration or Convention? As I mentioned above, I prefer Convention which should be obvious in the above snippet. Notice how any references to the source(s) seem to be missing? The BimlScript Framework I’ve recently demonstrated, which is available online, operates solely from the metadata in three table level annotations and three column level annotations. From that little bit of metadata, I “automatically” generate staging tables, overlaying views on the data warehouse including Role Playing Dimensions, Slowly Changing Dimension loads, Junk Dimension loads, and Fact table loads. Neat huh?

In the next article, I’ll take on the thorny topic of choosing a method and/or programming languages to generate Biml.

Related posts:

3 comments to Biml Metadata Options and Considerations

  • Sam Vanga

    Hi David,

    How and where you store metadata is an important consideration for using BimlScript, but unfortunately it’s not often talked — thanks for publishing this here!

    My preference is to store metadata in SQL tables (the first method you described). That allows me to use metadata in my BimlScript and easily expose it to data analysts and stewards so they can manage the metadata including source target mapping. For this reason, I even won’t mind if it is stored in spreadsheets. Extended properties, although I love them, and Biml annotations could it make it hard to expose the metadata for non-developers.

    I’d be interested to see your thoughts.

    Sam.

  • Hi David,

    Good explanation. I’ve been using this approach for a while now and have struggled to blog it coherently as my methodology involves way too many moving parts. I did a session at Cardiff SQL Relay and have posted a worked example that I walked everyone through, though I have to follow this up with a bunch of explanation posts that I’ve yet to finalise.

    I’m using PowerShell to read my config from a TSV file, I also produce the SQL DDL from this (again with PowerShell) and load it into a META DB that’s used to provide online docs.

    I think I lean more towards config than you do. I even carry config into the dest DB via extended properties using a great SCD merge solution from Alex Whittles. The meta data I attach to the tables and columns is richer than the set you have here which gives me lots of options to tune/adjust the BIML that emits the packages. My fact package emit BIML builds scrub reload, incremental and version tracked FACT load workflows which pretty much covers all my bases. The Alex Whittles stuff lets me do SCD 0,1,& 2 for my DIMS by changing the extended props in the dest DB.

    I use a number of conventions that are overlaid onto this which is why my solution feels a bit disjointed currently, and possibly explains why I’ve found it so hard to write about it coherently. I’ve come across a few people using similar external meta data driven BIML but as you say not much actual blogging.

    Am so pleased that someone else thinks this is a good way to approach the problem. I often get sideways looks at work from my fellow BI types when I start talking about this.

    Steve

  • […] that we’ve discussed various common metadata strategies used in creating BimlScript, I’m going to explain exactly how the metadata in my sample […]

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>