Simply put, metadata is data about data. Probably the most common way that SQL Server Professionals use metadata is when we query SQL Server System Views. In this article we’ll discuss using metadata to drive BimlScript automation. Before I start listing common metadata sources and methods, I need to briefly contrast two basic metadata philosophies I see being used. In some BimlScript frameworks, the metadata is highly detailed, as this typically allows the most flexible, configurable functionality. However, maintaining that detailed metadata can be tedious and time consuming. I subscribe to the philosophy of Convention over configuration. I prefer to code BimlScript in such a way as to expect that default values and behaviors (conventions) will be used and supplied. This leads to BimlScript that is much cleaner, easier to read, and maintain. At the same time, I try to provide mechanisms to allow as much flexibility as possible. For example, my BimlScript code follows default patterns I’ve identified and only deviates if specific metadata values are found.
There are several BimlScript metadata options/sources that I’ve noticed being used. Below I’ve listed the most common and I’ve included hyperlinks to blog posts or code samples where possible. I’ve noticed that more people are presenting on Biml than are blogging, but I’m not sure why.
SQL Server Metadata Tables
What you’ll often see in demonstrations are BimlScripts being controlled by metadata stored in dedicated SQL Server tables. Unfortunately, I couldn’t find any examples on the interwebs, so I provided the sample above. Information in the ColType field (abbreviated to fit on PowerPoint slides) indicates the functional role of said column. A column designated as BK would be treated as a Business Key, LKP triggers SSIS Lookup Transform code generation, etc. I’ve seen people use separate tables to house the metadata at the column level for Facts and Dimensions. I’ve also seen Table level metadata kept in one table and Column level metadata kept in a separate table. These metadata tables often look like Source to Target Mapping Documents. Creating and using SQL Server tables is second nature to most of us so this method is often used. These tables can be backed up with the rest of the database and also used to generate documentation for business purposes.
SQL Server Database Schema (DDL)
Some metadata strategies rely on Table and Column Names, though I couldn’t find examples on the web. For example, BimlScript can look for table names that begin with “Dim” or “Fact,” and take appropriate action. Also, I’ve seen people specify that any Column Name which ends in “Key” represents a Surrogate Key, or a column name that ends in “_BK” represents a Business (or Alternate) Key. I tend to avoid this practice because I’m a consultant and I work on many data warehouse projects. Therefore, I can’t dictate things like naming conventions. Also, I wouldn’t want business users to see column names like “CustomerCode_BK” on their reports.
SQL Server Extended Properties
Extended Properties have been around since SQL Server 2000 and people have been using them to document their database objects for some time. As a source for BimlScript metadata, Extended Properties have several advantages such as:
- They’re supported on most SQL Server objects.
- Can be automatically generated and maintained by third party software.
- Added and retrieved with relatively simple T-SQL code.
- They’re “hidden” from regular users.
- They’re backed up with the rest of the database.
I personally tend use SQL Server Extended Properties in conjunction with the next one, which is….
Biml Object Model – Annotation Tags
Like SQL Server Extended Properties, most (if not all) Biml objects such as tables, views, columns, etc. can be documented via annotations. There are several types of Annotations, in this article we’re going to focus on the “Tag” type. Below is sample of Biml code with Annotations that drive ETL automation.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Dimensions> <Dimension Name="DimCustomer" SchemaName="DataWarehouse.dbo" PackageSubpath="ImportTables"> <Annotations> <Annotation Tag="TableRole">Dimension</Annotation> <Annotation Tag="LkpNoMatch">Unknown</Annotation> <Annotation Tag="IsScripted">Y</Annotation> </Annotations> <Columns> <Column Name="CustomerKey" ScdType="Key"> <Annotations> <Annotation Tag="ColumnRole">SK</Annotation> <Annotation Tag="IsViewColumn">Y</Annotation> </Annotations> </Column>
Notice that I’ve specified a Dimension Table named DimCustomer which I’ve designated with a TableRole of Dimension. You’ll also notice that the first column, “CustomerKey” has a ColumnRole of “SK” which stands for Surrogate Key. These designations are only meaningful to the BimlScript you’re using. I’ll go into the nuts and bolts of how you refer to these annotation tags with BimlScript early next week.
What Metadata is Required?
As with so many other technical questions, the answer is “It Depends.” Do you lean more towards Configuration or Convention? As I mentioned above, I prefer Convention which should be obvious in the above snippet. Notice how any references to the source(s) seem to be missing? The BimlScript Framework I’ve recently demonstrated, which is available online, operates solely from the metadata in three table level annotations and three column level annotations. From that little bit of metadata, I “automatically” generate staging tables, overlaying views on the data warehouse including Role Playing Dimensions, Slowly Changing Dimension loads, Junk Dimension loads, and Fact table loads. Neat huh?
In the next article, I’ll take on the thorny topic of choosing a method and/or programming languages to generate Biml.