Archives

Agile Data Warehouse Design with Biml Videos

I had such a great time at SQL Saturday Indianapolis. The entire event was top notch. Everyone was friendly and almost everything ran smoothly.

As promised, I’ve uploaded two recordings which demonstrate how you can use a Biml framework to rapidly design and load a data warehouse.

Agile Data Warehouse and ETL Design with Biml

Business Intelligence Markup Language is literally changing the way we think about Data Warehouse design and ETL. In this interactive presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use BimlScript to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.

Download the Code

As an added bonus, I uploaded another recording on the topic. In the following video, I show you how to add new tables to the data warehouse load using the framework.

Adding Data Warehouse Dimension(s) using the Biml Framework

Hope you enjoy the videos. Feel free to ask questions below.

I'm Giving Two Biml Presentations at SQL Saturday Indianapolis!

This is just a heads up that I’ll be presenting at SQL Saturday Indianapolis on August 17th. I can’t wait since this is my first time speaking there and I’ll be giving newly revised versions of two of my favorite presentations.

Agile Data Warehouse and ETL Design with Biml

Business Intelligence Markup Language is literally changing the way we think about Data Warehouse design and ETL. In this interactive presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use BimlScript to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.

All of the code used in this presentation can be downloaded by attendees, customized, and used in their environments.

Download Agile Data Warehouse and ETL Design with Biml Presentation Files

Creating Your Own ETL Automation Framework with Biml for Beginners

You may have heard of Business Intelligence Markup Language or even tried to learn it. But how can you take the time to learn it, while getting the rest of your work done. In this highly interactive presentation, I’ll guide you through the process of learning/using Biml, without sacrificing your daily productivity.
We’ll cover how to:

  • Identify “easy” ETL patterns to BimlScript
  • Take incremental steps in automating these processes
  • Move to metadata driven BimlScript
  • Tie it all together into a full-fledged framework

As a bonus, you’ll leave the presentation with a completed framework which dynamically generates most types of flat files that you can use in your own environment.

Download Creating Your Own ETL Automation Framework Files

Anyway, it’ll be a great time, and I can’t wait to see some of you there!

Import Biml Metadata from Excel Advanced

This is the final post of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. Our mission for this series was to quickly create many Flat File extracts from provided source queries using SQL Server Integration Services.

Make sure you read and understand the concepts in each of the following articles before tackling this one.

One of the primary issues I’ve mentioned in each of the preceding short cut steps is a lack of real automation. Since we didn’t have our metadata externalized, we needed to copy/paste some object, T-SQL file, Biml file, or Excel Spreadsheet in order to create new extracts.

In this framework, I’ve externalized the metadata in an Excel worksheet which looks like this:

Feed NameSource QueryFile ExtensionFirst Row Column NamesHeader Row DelimiterColumn Row DelimiterColumn DelimiterText QualifierGenerate
OrderDetailsSELECT OrderID, ProductID, ProductName, UnitPrice, Quantity, Discount, ExtendedPrice FROM dbo.vwOrderDetailscsvYLFLFComma"Y
OrdersSELECT * FROM dbo.OrderstxtNCRLFCRLFTab~Y
ProductDetailsSELECT CategoryName, ProductID, ProductName, QuantityPerUnit, UnitPrice, UnitsInStock FROM dbo.vwProductDetails
WHERE Discontinued = 0
ORDER BY CategoryName, ProductName;
mehYCRLFCRLFVerticalBarY
CustomersSelect * From dbo.CustomerscsvYCRLFCRLFSemicolon"Y
SuppliersSELECT * FROM dbo.SuppliersedfNCRLFCRLFComma*Y
EmployeeDetailsSELECT E.EmployeeID, E.LastName, E.FirstName, E.Title, E.HireDate, E.City, E.PostalCode, E.Country, E.Extension, T.TerritoryDescription
FROM dbo.Employees E
JOIN dbo.EmployeeTerritories ET ON ET.EmployeeID=E.EmployeeID
JOIN dbo.Territories T ON T.TerritoryID=ET.TerritoryID;
csvYCRLFCRLFComma"Y
CustomerOrdersDetailEXECUTE [dbo].[CustOrdersDetail] @OrderID = 11077csvYCRLFCRLFComma"Y

The template is simple enough for most people to fill it out. All of the columns to the right of Source Query default to the original file specification we were given: comma-separated values (CSV) format, double quote text qualified.

Let’s Look at Code

This code is based on my previous example, so I won’t go into all the gory details. If you have code related questions, review the previous article for the specifics. Let’s start with the entire script, and then we’ll examine each section.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>
<#
  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\"; 
  string mdFileName = "Delimitted Feed Metadata.xlsx";
  string mdWorkSheetName = "Delimited Export List Advanced";

  int r; // Row Number

  var srcConn = (AstDbConnectionNode)RootNode.Connections["connNorthwind" ]; 
  Varigence.Languages.Biml.Table.AstTableNode feedMd;
#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}
foreach (string fil in filePaths)
{
	Workbook wb = null;
	try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets[mdWorkSheetName];  
		r = 2; // First Row for Import
		System.Data.DataTable dtFlatFileExtract = new System.Data.DataTable(); 
		dtFlatFileExtract.Columns.Add("FeedName");
		dtFlatFileExtract.Columns.Add("SourceQuery");
		dtFlatFileExtract.Columns.Add("FileExtension");
		dtFlatFileExtract.Columns.Add("FirstRowColumnNames");
		dtFlatFileExtract.Columns.Add("HeaderRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnDelimiter");
		dtFlatFileExtract.Columns.Add("TextQualifier");
		dtFlatFileExtract.Columns.Add("Generate");	
		
		string feedName = GetCellValue(ws, 1,r);
		//string generate = "Y";
		string generate = GetCellValue(ws, 9,r);

		while (!string.IsNullOrWhiteSpace(feedName)) {
			if (generate.ToUpper() == "Y") {
				dtFlatFileExtract.Rows.Add(
				feedName
				,GetCellValue(ws, 2,r)
				,GetCellValue(ws, 3,r)
				,GetCellValue(ws, 4,r)
				,GetCellValue(ws, 5,r)
				,GetCellValue(ws, 6,r)
				,GetCellValue(ws, 7,r)
				,GetCellValue(ws, 8,r)
				,generate				
				);
			}
		r++;
		feedName = GetCellValue(ws, 1,r);
		generate = GetCellValue(ws, 9,r);
		}			
			if (dtFlatFileExtract.Rows.Count == 0) 
			{
				throw new Exception("No Delimited Exports were found!");
            }
			#>
	<Connections>
	<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
		<FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.<#=dr["FileExtension"]#>" FileFormat="fff_<#=feedName#>"/>		
		<# } #>
    </Connections>
	<Packages>
			<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
        <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey" PackageSubpath="Export Flat Files">
            <Tasks>                
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind">
                        	<DirectInput>
								<#=sourceQuery #>
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST <#=feedName#>"  ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/> 
                   </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
		<# } #>		
    </Packages>	
	<FileFormats> 	
		<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString();
			try  /*Validate Source Query*/
			{
			feedMd = ExternalDataAccess.GetQuerySchema(srcConn, sourceQuery,null,0 );
			}
			catch (Exception ex1)
			{
			string errorMessage = "Feed Name: " +	dr["FeedName"].ToString() + " CANNOT be evaluated. Error Message: " + ex1.ToString().Split(new [] { '\r', '\n' }).FirstOrDefault();
					throw new Exception(errorMessage);		
			}	
		#>	
		<FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited" ColumnNamesInFirstDataRow="<#=((dr["FirstRowColumnNames"].ToString() == "Y" ) ? "true" : "false") #>" TextQualifier="<#=dr["TextQualifier"].ToString().XmlEscape() #>" HeaderRowDelimiter="<#=dr["HeaderRowDelimiter"].ToString().XmlEscape() #>">
			<Columns>		
		<# foreach (var column in feedMd.Columns) {#>
				<Column Name="<#=column.Name #>" DataType="<#=column.DataType #>" Length="<#=column.Length #>" Delimiter="<#=(column == feedMd.Columns.Last()) ? dr["ColumnRowDelimiter"].ToString().XmlEscape()  : dr["ColumnDelimiter"].ToString().XmlEscape()#>"/> 
		<# } #>	
			</Columns>
		</FlatFileFormat>
		<# } #>
	</FileFormats>
<#
}

catch (Exception ex)
	{
		throw new Exception(ex.ToString());
	}
}	
#>
<# 
xl.Quit();
#>
</Biml>

<#+
public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
{
	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
return value == null ? string.Empty : value.ToString().Trim()  ;
}
#>

Once again, we’ll start from the top with our namespaces and assemblies which are required for file system access and Excel functionality.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>

Lines 6 through 9 specify the workbook location and worksheet name we’ll be importing from.

  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\"; 
  string mdFileName = "Delimitted Feed Metadata.xlsx";
  string mdWorkSheetName = "Delimited Export List Advanced";

In the next section, we’ll declare a row number variable which keeps track of which row we are importing from the worksheet. Since this framework takes the provided queries and extracts the metadata from them, I need to define the source connection. I’ve also declared a Biml table variable, that we’ll use to store the FlatFileFormat column metadata.

  int r; // Row Number
 
  var srcConn = (AstDbConnectionNode)RootNode.Connections["connNorthwind" ]; 
  Varigence.Languages.Biml.Table.AstTableNode feedMd;

The next section looks for the file(s) on disk, and then loops through any it finds, in this case just one.

<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}
foreach (string fil in filePaths)
{
	Workbook wb = null;

We then try to open the workbook and worksheet, and assign the first import row as 2.

	try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets[mdWorkSheetName];  
		r = 2; // First Row for Import

Next I create a DataTable called dtFlatFileExtract to hold the metadata, note that the column names correspond to the worksheet.

		System.Data.DataTable dtFlatFileExtract = new System.Data.DataTable(); 
		dtFlatFileExtract.Columns.Add("FeedName");
		dtFlatFileExtract.Columns.Add("SourceQuery");
		dtFlatFileExtract.Columns.Add("FileExtension");
		dtFlatFileExtract.Columns.Add("FirstRowColumnNames");
		dtFlatFileExtract.Columns.Add("HeaderRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnDelimiter");
		dtFlatFileExtract.Columns.Add("TextQualifier");
		dtFlatFileExtract.Columns.Add("Generate");	

Next we’ll store the Feed Name and Generate values from the 2nd row to variables. The while loop which follows adds the values for the specified row (r) of the spreadsheet to the DataTable defined above. After each loop, the row number is incremented. This causes the process to move down the spreadsheet until the first blank row is found, which stops the import process.

		string feedName = GetCellValue(ws, 1,r);
		//string generate = "Y";
		string generate = GetCellValue(ws, 9,r);

		while (!string.IsNullOrWhiteSpace(feedName)) {
			if (generate.ToUpper() == "Y") {
				dtFlatFileExtract.Rows.Add(
				feedName
				,GetCellValue(ws, 2,r)
				,GetCellValue(ws, 3,r)
				,GetCellValue(ws, 4,r)
				,GetCellValue(ws, 5,r)
				,GetCellValue(ws, 6,r)
				,GetCellValue(ws, 7,r)
				,GetCellValue(ws, 8,r)
				,generate				
				);
			}
		r++;
		feedName = GetCellValue(ws, 1,r);
		generate = GetCellValue(ws, 9,r);
		}	

The next section throws an error if feed metadata is not found.

			if (dtFlatFileExtract.Rows.Count == 0) 
			{
				throw new Exception("No Delimited Exports were found!");
            }

Next we simply loop through Connections and then Packages, which require only the Feed Name and Source Query variables.

	<Connections>
	<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
		<FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.<#=dr["FileExtension"]#>" FileFormat="fff_<#=feedName#>"/>		
		<# } #>
    </Connections>
	<Packages>
			<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
        <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey" PackageSubpath="Export Flat Files">
            <Tasks>                
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind">
                        	<DirectInput>
								<#=sourceQuery #>
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST <#=feedName#>"  ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/> 
                   </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
		<# } #>		
    </Packages>	

As I mentioned in previous articles, the challenging part of this framework is creating the FlatFileFormat, particularly the columns. To begin, we loop through each DataRow and import the metadata with a Try/Catch. The Catch warns you when your query doesn’t work.

	<FileFormats> 	
		<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString();
			try  /*Validate Source Query*/
			{
			feedMd = ExternalDataAccess.GetQuerySchema(srcConn, sourceQuery,null,0 );
			}
			catch (Exception ex1)
			{
			string errorMessage = "Feed Name: " +	dr["FeedName"].ToString() + " CANNOT be evaluated. Error Message: " + ex1.ToString().Split(new [] { '\r', '\n' }).FirstOrDefault();
					throw new Exception(errorMessage);		
			}	
		#>	

Next we populate the FlatFileFormat and it’s columns.

		<FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited" ColumnNamesInFirstDataRow="<#=((dr["FirstRowColumnNames"].ToString() == "Y" ) ? "true" : "false") #>" TextQualifier="<#=dr["TextQualifier"].ToString().XmlEscape() #>" HeaderRowDelimiter="<#=dr["HeaderRowDelimiter"].ToString().XmlEscape() #>">
			<Columns>		
		<# foreach (var column in feedMd.Columns) {#>
				<Column Name="<#=column.Name #>" DataType="<#=column.DataType #>" Length="<#=column.Length #>" Delimiter="<#=(column == feedMd.Columns.Last()) ? dr["ColumnRowDelimiter"].ToString().XmlEscape()  : dr["ColumnDelimiter"].ToString().XmlEscape()#>"/> 
		<# } #>	
			</Columns>
		</FlatFileFormat>
		<# } #>
	</FileFormats>

Finally we have the Catch statement which corresponds to the original Try (opening the workbook). This is followed by the closing Biml tag and the GetCellValue method.

<#
}

catch (Exception ex)
	{
		throw new Exception(ex.ToString());
	}
}	
#>
<# 
xl.Quit();
#>
</Biml>

<#+
public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
{
	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
return value == null ? string.Empty : value.ToString().Trim()  ;
}
#>

Next Steps

In this series we’ve taken several optional shortcuts, each offered a way to learn and use Biml while still getting real work done. Now, we have a fully functioning framework which dynamically creates SSIS export packages based on metadata contained in a spreadsheet. Hopefully this series has inspired you to learn Biml, and create your own frameworks. Be sure to follow this blog because I’ll share more patterns in the near future.

The files used for this post can be downloaded here.

My Wife and I are Presenting at SQL Saturday Austin on July 13th!

*** Quick Update. I’ve been asked also present “SSIS Performance Tuning with BimlScript” this Saturday as well. Abstract is below.

One again, my my lovely wife Melissa and I will be giving Biml presentations.

She and I will be co-presenting the following:

The True Story of Learning Biml

How do you learn Biml when you don’t have years of SQL experience? Is Biml even worth my time and energy when I’m still working on perfecting my SSIS skills? It seems like a lot of work, what is in it for me? In this session you will learn:

  • Why Biml is worth your time and effort.
  • How to identify ideal SSIS packages for conversion.
  • How to convert those packages into Biml.
  • How to troubleshoot your Biml build errors when Google fails you.
  • How to automate SSIS package creation with BimlScript.

Download True Story of Biml Presentation Files

After that, I’m presenting completely new versions of the following presentations:

Agile Data Warehouse and ETL Design with Biml

Business Intelligence Markup Language is literally changing the way we think about Data Warehouse design and ETL. In this interactive presentation you’ll see how you can rapidly iterate through the Data Warehouse Development Cycle and provide ROI almost immediately. Learn how you can use BimlScript to automatically create staging tables, SSIS Packages, and other necessary objects during the modeling process.

All of the code used in this presentation can be downloaded by attendees, customized, and used in their environments.

Download Agile Data Warehouse and ETL Design with Biml Presentation Files

SSIS Performance Tuning with BimlScript

You may heard that Biml can automate ETL loads, but what if your data warehouse is several terabytes with a tight load window. In this demo focused presentation, we’ll discuss how to use BimlScript to accomplish the following:

  • Dynamically create multiple SISS packages for a data warehouse ETL load, each following a different best practice design pattern.
  • Automatically execute each package multiple times and record the results for analysis.
  • Determine which package version should be used in production.

The only way to be sure that your SSIS packages are performing optimally is to test them against multiple patterns. Make your life easier and let BimlScript do the work for you.

Download SSIS Performance Tuning with BimlScript Presentation Files

If you’re in the Austin area, head to the SQL Saturday Austin event page and register. I can’t wait to hanging out with all my friends in Austin.

Import Biml Metadata Directly from Excel

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so.

It might surprise some of you that I use Excel extensively for Biml metadata. After all, I’ve ranted about the misuse of Excel in the past. Many of my clients prefer using it over other methods.

Why Use Excel?

Excel Spreadsheets as a metadata source have a lot going for them.

  • Everyone uses Excel and is comfortable with it.
  • Excel is incredibly customizable and versatile.
  • Excel offers data validation and filtering.

For these reasons, I create customized Excel spreadsheets that function as a lite Graphic User Interfaces (GUI) for metadata. Of course, Excel isn’t a perfect metadata source. For one thing, you have to own a licensed copy of Excel. Second, because spreadsheets are so easy to customize, users sometimes “improve” them further and break your code.

While there are other methods for C# importation of Excel spreadsheets, the method I’m going to demonstrate is my favorite. It utilizes the Microsoft Office Interop Assembly for Excel, and requires Excel to be installed on your local machine.

Why use the Interop Excel Assembly?

First, the assembly is extremely flexible. The pattern I’m demonstrating in this post is simple and involves importing from a single sheet. However, it is easy to adapt the process for multiple sheets in the same Excel workbook or to import multiple workbooks at once. In a future article, I’ll demonstrate how I create and load a data warehouse where each worksheet represents a dimension or fact table.

Second, it performs very well. I can import the metadata from 20+ sheets, which loads an entire data warehouse, in under a minute.

Show Me the Code

I’ve always found the easiest way to learn something is to review a working example. This example will read a single sheet from a workbook that contains connection metadata on a tab named “Connections.” The metadata is as follows:

Connection NameDatabase NameCreate In ProjectRetain Same ConnectionConnection String
connNorthwindNorthwindtruefalseProvider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=Northwind
connETLManagerETLManagertruefalseProvider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=ETLManager
connNorthwindODSNorthwindODStruefalseProvider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=NorthwindODS

Unlike previous articles, I’m going to share the entire code now and then discuss the individual pieces below.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>
<#
  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\";
  string mdFileName = "Environment.xlsx";

  int r; // Row Number
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}
foreach (string fil in filePaths)
{
	Workbook wb = null;
	try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets["Connections"];  
		r = 2; // First Row for Import
			System.Data.DataTable dtEnvironment = new System.Data.DataTable(); 
			dtEnvironment.Columns.Add("ConnectionName");
			dtEnvironment.Columns.Add("DatabaseName");
			dtEnvironment.Columns.Add("CreateInProject");
			dtEnvironment.Columns.Add("RetainSameConnection");
			dtEnvironment.Columns.Add("ConnectionString");

			string ConnectionName = GetCellValue(ws,1,r);
			while (!string.IsNullOrWhiteSpace(ConnectionName)) {
			 dtEnvironment.Rows.Add(
				 GetCellValue(ws,1,r)
				,GetCellValue(ws,2,r)
				,GetCellValue(ws,3,r)
				,GetCellValue(ws,4,r)
				,GetCellValue(ws,5,r)
			 );
			r++;
			ConnectionName = GetCellValue(ws, 1,r);
			}
			if (dtEnvironment.Rows.Count == 0) 
			{
				throw new Exception("No Connection Information found on Connections Worksheet");
            }
			#>
	<Connections>
			<# foreach(DataRow drEnvironment in dtEnvironment.Rows) {	#>
		<Connection Name="<#=drEnvironment["ConnectionName"]#>" ConnectionString="<#=drEnvironment["ConnectionString"]#>" CreateInProject="<#=drEnvironment["CreateInProject"]#> " RetainSameConnection="<#=drEnvironment["RetainSameConnection"]#>" >
			<Annotations>
				<Annotation Tag="DatabaseName"><#=drEnvironment["DatabaseName"]#></Annotation>
            </Annotations>			
		</Connection>	
			<# } #>
	</Connections>
<#		
		}
		catch (Exception ex)
			{
				throw new Exception(ex.ToString());
			}
}
xl.Quit();
#>
</Biml>

<#+
  public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
  {
  	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
  return value == null ? string.Empty : value.ToString().Trim();
  }
#>

Lets start at the top. This section imports the required namespaces and assemblies. The first two are necessary to locate the spreadsheet file on disk, while lines 3 and 4 import the Interop assembly.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL" #>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>

Find Your Assembly

In order to reference the assembly, you’ll have to use the fully qualified name from the Global Assembly Cache (GAC). In order to find it, open the Developer Command Prompt for your version of Visual Studio. The easiest way to find the DCP is to press the Windows key and begin typing “Develop….” until the name of the tool pops up. After it opens, paste the following statement into the DCP and press Enter to find the Excel assembly information on your local machine:

gacutil /l Microsoft.Office.Interop.Excel

Which should return something like this:

Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL

Copy and paste that into the assembly name string on line 3 above.

Now let’s jump to the bottom and examine the GetCellValue method. GetCellValue pulls the value out of the cell designated by column and row, converts it to a string, and trims any leading or trailing spaces. It also returns an empty string if the cell is null.

<#+
  public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
  {
  	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
  return value == null ? string.Empty : value.ToString().Trim();
  }
#>

Next we’ll specify what metadata spreadsheet we are importing. I use the Environment variable to get my default folder path, “C:\Users\david”, and then build from there. This helps to abstract the link so the same code should work for multiple developers working on the same project. The variable “r” keeps track of which row in the spreadsheet we are reading.

<#
  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\";
  string mdFileName = "Environment.xlsx";

  int r; // Row Number
#>

This next section looks for the file and returns the entire file path including the name and extension. I realize that the SearchOption.AllDirectories isn’t really needed since I’m only searching for a single file, but I use this same pattern in other situations which import all of the metadata spreadsheets in a particular folder. I’ve also included a check that the file exists at that location.

<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}

Next I loop through all of the files found in the previous step, in this case just one, and create a Workbook object.

foreach (string fil in filePaths)
{
	Workbook wb = null;

Next we try to open the workbook and assign a Worksheet object to our sheet called “Connections.” Also, we set the first row to import to 2, as we want to skip the column headers.

try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets["Connections"];  
		r = 2; // First Row for Import

Next we’ll create a new DataTable called dtEnvironment to store the imported metadata. Notice that the column names correspond to those in the spreadsheet.

			System.Data.DataTable dtEnvironment = new System.Data.DataTable(); 
			dtEnvironment.Columns.Add("ConnectionName");
			dtEnvironment.Columns.Add("DatabaseName");
			dtEnvironment.Columns.Add("CreateInProject");
			dtEnvironment.Columns.Add("RetainSameConnection");
			dtEnvironment.Columns.Add("ConnectionString");

Next we’ll store the ConnectionName value from the 1st column 2nd row to a variable of the same name. The while loop which follows adds the values for the specified row (r) of the spreadsheet to the DataTable defined above. After each loop, the row number is incremented. This causes the process to move down the spreadsheet until the first blank row is found, which stops the import process.

			string ConnectionName = GetCellValue(ws,1,r);
			while (!string.IsNullOrWhiteSpace(ConnectionName)) {
			 dtEnvironment.Rows.Add(
				 GetCellValue(ws,1,r)
				,GetCellValue(ws,2,r)
				,GetCellValue(ws,3,r)
				,GetCellValue(ws,4,r)
				,GetCellValue(ws,5,r)
			 );
			r++;
			ConnectionName = GetCellValue(ws, 1,r);
			}

The next section checks to make sure that at least one connection was imported.

			if (dtEnvironment.Rows.Count == 0) 
			{
				throw new Exception("No Connection Information found on Connections Worksheet");
            }
			#>

Now that we’ve completely loaded the DataTable, we loop through it to create our Connections. I’ve taken an additional step of adding the DatabaseName as an annotation for each connection. This simplifies BimlScript writing in future steps.

	<Connections>
			<# foreach(DataRow drEnvironment in dtEnvironment.Rows) {	#>
		<Connection Name="<#=drEnvironment["ConnectionName"]#>" ConnectionString="<#=drEnvironment["ConnectionString"]#>" CreateInProject="<#=drEnvironment["CreateInProject"]#> " RetainSameConnection="<#=drEnvironment["RetainSameConnection"]#>" >
			<Annotations>
				<Annotation Tag="DatabaseName"><#=drEnvironment["DatabaseName"]#></Annotation>
            </Annotations>			
		</Connection>	
			<# } #>
	</Connections>

Finally, we have a standard Catch statement and the closing Biml tag.

<#		
		}
		catch (Exception ex)
			{
				throw new Exception(ex.ToString());
			}
}
xl.Quit();
#>
</Biml>

Simple huh? This code produces the following Biml.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<Connection Name="connNorthwind" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=Northwind" CreateInProject="true " RetainSameConnection="false" >
			<Annotations>
				<Annotation Tag="DatabaseName">Northwind</Annotation>
			</Annotations>
		</Connection>
		<Connection Name="connETLManager" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=ETLManager" CreateInProject="true " RetainSameConnection="false" >
			<Annotations>
				<Annotation Tag="DatabaseName">ETLManager</Annotation>
			</Annotations>
		</Connection>
		<Connection Name="connNorthwindODS" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=NorthwindODS" CreateInProject="true " RetainSameConnection="false" >
			<Annotations>
				<Annotation Tag="DatabaseName">NorthwindODS</Annotation>
			</Annotations>
		</Connection>
	</Connections>
</Biml>

Summary

Notice that this process is dynamic based on external metadata. When someone adds another connection to the spreadsheet, it will be added to our Biml project automatically on your next build. In the next article, we’ll adapt this method to read SQL queries from a spreadsheet and automagically create an export package for each.

The files used for this post can be downloaded here.

BimlScript Code Nuggets and Mad Libs

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so.

Mad Libs?

Per Wikipedia, “Mad Libs is a phrasal template word game where one player prompts others for a list of words to substitute for blanks in a story, before reading the – often comical or nonsensical – story aloud. The game is frequently played as a party game or as a pastime.” I remember playing these games as a kid. Big Bang Theory even incorporated Mad Libs into one of their episodes.

Since each of these words in the list is saved, and then inserted into another string, it makes a good programming exercise. In fact, I remember doing this exercise in middle school back when acid washed jeans were in style. This was in the early days when most people hadn’t even used computers, much less own one, though I was lucky enough to have one at home. The school was fortunate enough to have a room of about 20 Apple IIe computers. Our math teacher was pretty progressive and let us spend an hour writing our own Mad Lib program in BASIC. The program would pop up questions to the user, save the input in variables, and then insert them into a bland, innocent paragraph of text. Well, being a class clown, I amused my classmates by inserting inappropriate terms into my program during testing. It got some cheap laughs, and then I got detention. Oh well.

BimlScript Mad Libs?

Well, if you’ve been following along with this series, you know that we are creating a basic BimlScript framework that exports delimited flat files. In two previous articles, I demonstrated how to create this Biml using Excel and T-SQL. To refresh your memory, this is the Biml that we need to generate:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders"/>
    </Connections>
    <Packages>
        <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">
            <Tasks>
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
                            <DirectInput>
SELECT * FROM dbo.Orders
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST Orders"  ConnectionName="conn_ff_Orders" Overwrite="true" ValidateExternalMetadata="false"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
    <FileFormats>
        <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited"  ColumnNamesInFirstDataRow="true" TextQualifier=""">
            <Columns>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/> 
                <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/> 
                <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/> 
                <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

The above Biml code is analogous to a Mad Lib “story.” First, create a Biml file from the above code. Second, determine which “questions” must be asked to automate our “story.” The first attribute we’ll need is the Feed Name, in this case it is “Orders”. The second question is what query is used for that feed. Add these variables to the top of your Biml.

<#
  string feedName = "Orders"; 
  string qry = "SELECT * FROM dbo.Orders";
#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders"/>
    </Connections>
    <Packages>
        <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">

Now you are going to perform two Find and Replace (CTRL-H) operations on the Biml Code above for lines 3 through 22. Since the Flat File Format Columns section is more complicated, we’ll handle that in a bit. Notice that we are replacing the hard coded values with BimlScript text nuggets. The <#= > nugget simply displays the variable value as a string.

FindReplace
Orders<#=feedName#>
SELECT * FROM dbo.Orders<#=qry#>

Here are the results of our Find/Replace. Again, be careful not to perform the replace any of the column metadata.

<#
  string feedName = "Orders"; 
  string qry = "SELECT * FROM dbo.Orders";
#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.csv" FileFormat="fff_<#=feedName#>"/>
    </Connections>
    <Packages>
        <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">
            <Tasks>
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind">
                            <DirectInput>
<#=qry#>
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST <#=feedName#>"  ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
    <FileFormats>
        <FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited"  ColumnNamesInFirstDataRow="true" TextQualifier=""">
            <Columns>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/> 
                <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/> 
                <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/> 
                <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

So, with a simple Replace exercise, we have prepared around 75% of our script to loop automatically. Now let’s handle the Flat File Format Columns.

GetQuerySchema

In the previous article, I documented how to Import Biml Metadata with GetQuerySchema. To simplify things, let’s work on the column section separately.

<#
  string feedName = "Orders"; 
  string qry = "SELECT * FROM dbo.Orders";
  
  var srcConn = SchemaManager.CreateConnectionNode("connNorthwind", @"Provider=SQLNCLI11;data source=localhost;integrated security=SSPI;initial catalog=Northwind;");
  var feedMd = ExternalDataAccess.GetQuerySchema(srcConn, qry,null,0);
#>
<#=feedMd.GetBiml()#>

Now let’s compare the two code sections. First, the Biml generated from the previous statement.

<Table Name="" SchemaName="">
	<Columns>
		<Column Name="OrderID" Precision="0" Scale="0" />
		<Column Name="CustomerID" DataType="StringFixedLength" Length="5" Precision="0" Scale="0" />
		<Column Name="EmployeeID" Precision="0" Scale="0" />
		<Column Name="OrderDate" DataType="DateTime" Precision="0" Scale="0" />
		<Column Name="RequiredDate" DataType="DateTime" Precision="0" Scale="0" />
		<Column Name="ShippedDate" DataType="DateTime" Precision="0" Scale="0" />
		<Column Name="ShipVia" Precision="0" Scale="0" />
		<Column Name="Freight" DataType="Currency" Precision="0" Scale="0" />
		<Column Name="ShipName" DataType="String" Length="40" Precision="0" Scale="0" />
		<Column Name="ShipAddress" DataType="String" Length="60" Precision="0" Scale="0" />
		<Column Name="ShipCity" DataType="String" Length="15" Precision="0" Scale="0" />
		<Column Name="ShipRegion" DataType="String" Length="15" Precision="0" Scale="0" />
		<Column Name="ShipPostalCode" DataType="String" Length="10" Precision="0" Scale="0" />
		<Column Name="ShipCountry" DataType="String" Length="15" Precision="0" Scale="0" />
	</Columns>
</Table>        

Next our hard coded sample from above.

                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/> 
                <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/> 
                <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/> 
                <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 

Notice how similar they are? GetQuerySchema returns the Column Name, Data Type, and Length. We’re only responsible for the Delimiter attribute. We’ll have to loop through the columns of the Flat File Format. Let’s add a simple foreach loop and include a row of the hard coded column metadata that we need to generate.

<# foreach (var column in feedMd.Columns) {#>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/>
			<# } #>	

Next go through the Mad Libs Find/Replace process again with the following attributes:

FindReplace
OrderID<#=column.Name#>
Int32<#=column.DataType#>
"0""<#=column.Length#>"

Now we only have to handle the Delimiter. Remember that the value for this is Comma for every column except the last which is CRLF. There are a couple of ways to handle this in C#.

 if (column == feedMd.Columns.Last()) { "CRLF" } else { "Comma" } 
// Or Shorthand
(column == feedMd.Columns.Last()) ? "CRLF"  : "Comma"

To make our BimlScript easier to read, we’ll use the shorthand.

<# foreach (var column in feedMd.Columns) {#>
				<Column Name="<#=column.Name#>" DataType="<#=column.DataType#>" Length="<#=column.Length#>" Delimiter="<#=(column == feedMd.Columns.Last()) ? "CRLF"  : "Comma"#>"/> 
			<# } #>	

And this produces the output we need.

<Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/>
<Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/>
<Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/>
<Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/>
<Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/>
<Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/>
<Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/>
<Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/>
<Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/>
<Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/>
<Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/>
<Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/>
<Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/>
<Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 

So, our final BimlScript looks like this:

<#
  string feedName = "Orders"; 
  string qry = "SELECT * FROM dbo.Orders";
  
  var srcConn = SchemaManager.CreateConnectionNode("connNorthwind", @"Provider=SQLNCLI11;data source=localhost;integrated security=SSPI;initial catalog=Northwind;");
  var feedMd = ExternalDataAccess.GetQuerySchema(srcConn, qry,null,0);
#>


<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.csv" FileFormat="fff_<#=feedName#>"/>
    </Connections>
    <Packages>
        <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">
            <Tasks>
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind">
                            <DirectInput>
<#=qry#>
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST <#=feedName#>"  ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>	
    <FileFormats>
        <FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited"  ColumnNamesInFirstDataRow="true" TextQualifier=""">
            <Columns>
<# foreach (var column in feedMd.Columns) {#>
				<Column Name="<#=column.Name#>" DataType="<#=column.DataType#>" Length="<#=column.Length#>" Delimiter="<#=(column == feedMd.Columns.Last()) ? "CRLF"  : "Comma"#>"/> 
			<# } #>	
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

So, now we’re using real BimlScript, but this still isn’t automated. To generate new feeds, we’d need to copy and paste the Biml file and change the variable values at the top. In order to fully automate this process we’d need an external metadata source that contains a list of Feed Names and Source Queries. In the next article, I’ll demonstrate how to import Biml metadata directly from an Excel spreadsheet, loop through each feed’s metadata, and generate a separate package for each.

The Biml files can be downloaded here.

Generating Biml Scripts Using T-SQL

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. In the previous article, I showed you how to use Excel generate Biml. In this article, we’ll generate the entire Biml Script for an export package using only T-SQL. To refresh your memory, this is the Biml from a previous article that we need to automate:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders"/>
    </Connections>
    <Packages>
        <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">
            <Tasks>
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
                            <DirectInput>
SELECT * FROM dbo.Orders
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST Orders"  ConnectionName="conn_ff_Orders" Overwrite="true" ValidateExternalMetadata="false"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
    <FileFormats>
        <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited"  ColumnNamesInFirstDataRow="true" TextQualifier=""">
            <Columns>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/> 
                <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/> 
                <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/> 
                <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

We are also starting with the T-SQL script used last time. However, I’m adding a temp table to make code manipulation easier in later steps. I’m also adding a variable for Feed Name which we’ll need as well. I’ve highlighted the changes below.

USE Northwind
GO

DECLARE @SrcQuery NVARCHAR(4000), @FeedName NVARCHAR(25);
SELECT @SrcQuery = 'SELECT * FROM dbo.Orders', @FeedName = 'Orders';

IF object_id('tempdb..#FileFormatMetadata') IS NOT NULL DROP TABLE #FileFormatMetadata;
SELECT 
 t.[Column Name]
,t.[SQL DT]
,CASE 
WHEN t.[SQL DT] in ('char', 'varchar','text') THEN (t.max_length)
WHEN t.[SQL DT] in ('nchar', 'nvarchar','ntext') THEN (t.max_length/2)
ELSE 0 END [Length]
 INTO #FileFormatMetadata
FROM (SELECT name [Column Name]
 ,CASE WHEN charindex('(',system_type_name) = 0 THEN system_type_name
  ELSE left(system_type_name, charindex('(',system_type_name) - 1)
  END [SQL DT]
 ,max_length
FROM sys.dm_exec_describe_first_result_set(@SrcQuery, NULL, 1) where [name] is not NULL  and is_hidden = 0) t

SELECT * FROM #FileFormatMetadata;

If you’ll recall from our Excel adventure, there are two more primary attributes we need to derive, Delimiter and Biml Data Type. In order to determine the Delimiter value, we need to know which column is last in the feed. That’s not difficult because “column_ordinal” is included in sys.dm_exec_describe_first_result_set output. The Biml Data Type was derived via a lookup table in the spreadsheet. In this example, we’ll use a Case When statement instead. The new query and output are below.

IF object_id('tempdb..#FileFormatMetadata') IS NOT NULL DROP TABLE #FileFormatMetadata;
SELECT 
 t.[Column Name]
,t.[SQL DT]
,CASE 
WHEN t.[SQL DT] in ('char', 'varchar','text') THEN (t.max_length)
WHEN t.[SQL DT] in ('nchar', 'nvarchar','ntext') THEN (t.max_length/2)
ELSE 0 END [Length]
,CASE 
WHEN t.[SQL DT] = 'smallint' THEN 'Int16'
WHEN t.[SQL DT] = 'tinyint' THEN 'Byte'
WHEN t.[SQL DT] = 'int' THEN 'Int32'
WHEN t.[SQL DT] = 'bigint' THEN 'Int64'
WHEN t.[SQL DT] = 'char' THEN 'AnsiStringFixedLength'
WHEN t.[SQL DT] = 'varchar' THEN 'AnsiString'
WHEN t.[SQL DT] = 'nchar' THEN 'StringFixedLength'
WHEN t.[SQL DT] = 'nvarchar' THEN 'String'
WHEN t.[SQL DT] = 'date' THEN 'Date'
WHEN t.[SQL DT] = 'datetime' THEN 'DateTime'
WHEN t.[SQL DT] = 'datetime2' THEN 'DateTime2'
WHEN t.[SQL DT] = 'datetimeoffset' THEN 'DateTimeOffset'
WHEN t.[SQL DT] = 'smalldatetime' THEN 'DateTime'
WHEN t.[SQL DT] = 'time' THEN 'Time'
WHEN t.[SQL DT] = 'decimal' THEN 'Decimal'
WHEN t.[SQL DT] = 'numeric' THEN 'Decimal'
WHEN t.[SQL DT] = 'bit' THEN 'Boolean'
WHEN t.[SQL DT] = 'float' THEN 'Double'
WHEN t.[SQL DT] = 'real' THEN 'Single'
WHEN t.[SQL DT] = 'money' THEN 'Currency'
WHEN t.[SQL DT] = 'smallmoney' THEN 'Currency'
WHEN t.[SQL DT] = 'text' THEN 'AnsiString'
WHEN t.[SQL DT] = 'ntext' THEN 'String'
ELSE NULL END [Biml Data Type]
,t.ColumnOrdinal
 INTO #FileFormatMetadata
FROM (SELECT name [Column Name]
 ,CASE WHEN charindex('(',system_type_name) = 0 THEN system_type_name
  ELSE left(system_type_name, charindex('(',system_type_name) - 1)
  END [SQL DT]
 ,max_length
 ,column_ordinal ColumnOrdinal
FROM sys.dm_exec_describe_first_result_set(@SrcQuery, NULL, 1) where [name] is not NULL  and is_hidden = 0) t

SELECT * FROM #FileFormatMetadata;
Column NameSQL DTLengthBiml Data TypeColumnOrdinal
OrderIDint0Int321
CustomerIDnchar5StringFixedLength2
EmployeeIDint0Int323
OrderDatedatetime0DateTime4
RequiredDatedatetime0DateTime5
ShippedDatedatetime0DateTime6
ShipViaint0Int327
Freightmoney0Currency8
ShipNamenvarchar40String9
ShipAddressnvarchar60String10
ShipCitynvarchar15String11
ShipRegionnvarchar15String12
ShipPostalCodenvarchar10String13
ShipCountrynvarchar15String14

Remember that Biml is simply XML, and the FOR XML Clause is a clean way to generate it. So, let’s start with the Connections node, which is simple because the only variable needed is @FeedName. Our query only returns a single record which is element FlatFileConnection, who’s parent (root) is Connections.

SELECT 'conn_ff_' + @FeedName "@Name"
	,'C:\FileExport\' + @FeedName + '.csv' "@FilePath"
	,'fff_' + @FeedName "@FileFormat"
    FOR XML PATH('FlatFileConnection'), type, root('Connections')
<Connections>
  <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders" />
</Connections>

The Packages code is slightly more complex than Connections, as it involves several levels in the XML tree. This first query simply returns the attributes at the Package level:

SELECT 'Export ' + @FeedName "@Name"
	  ,'Linear' "@ConstraintMode"
	  ,'DontSaveSensitive' "@ProtectionLevel"
	FOR XML PATH('Package'), TYPE,ROOT('Packages')
<Packages>
  <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" />
</Packages>

Inside the package element we have to generate the Tasks and Transformations. The root of this query is Tasks. Notice the aliases on the element names reflect their tree levels because Tasks include one or more Transformations. The base query for the Tasks is:

SELECT	'DFT Load File' "@Name"
		,'OLE_SRC ' + @FeedName "Transformations/OleDbSource/@Name"
		,'connNorthwind' "Transformations/OleDbSource/@ConnectionName"
		,@SrcQuery "Transformations/OleDbSource/DirectInput"
		,'FF_DEST ' + @FeedName "Transformations/FlatFileDestination/@Name"
		,'conn_ff_' + @FeedName "Transformations/FlatFileDestination/@ConnectionName"
		,'true'  "Transformations/FlatFileDestination/@Overwrite"
		,'false'  "Transformations/FlatFileDestination/@ValidateExternalMetadata"
		FOR XML PATH('Dataflow'),TYPE, root('Tasks')
<Tasks>
  <Dataflow Name="DFT Load File">
    <Transformations>
      <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
        <DirectInput>SELECT * FROM dbo.Orders</DirectInput>
      </OleDbSource>
      <FlatFileDestination Name="FF_DEST Orders" ConnectionName="conn_ff_Orders" Overwrite="true" ValidateExternalMetadata="false" />
    </Transformations>
  </Dataflow>
</Tasks>

I could have avoided the lengthy element aliases by creating more sub-queries, but those get more difficult to read as well. Since Tasks are a sub-element of each Package, their query becomes a sub-query of Packages like this:

SELECT 'Export ' + @FeedName "@Name"
	  ,'Linear' "@ConstraintMode"
	  ,'DontSaveSensitive' "@ProtectionLevel"
		,(SELECT	
		'DFT Load File' "@Name"
		,'OLE_SRC ' + @FeedName "Transformations/OleDbSource/@Name"
		,'connNorthwind' "Transformations/OleDbSource/@ConnectionName"
		,@SrcQuery "Transformations/OleDbSource/DirectInput"
		,'FF_DEST ' + @FeedName "Transformations/FlatFileDestination/@Name"
		,'conn_ff_' + @FeedName "Transformations/FlatFileDestination/@ConnectionName"
		,'true'  "Transformations/FlatFileDestination/@Overwrite"
		,'false'  "Transformations/FlatFileDestination/@ValidateExternalMetadata"
		FOR XML PATH('Dataflow'),TYPE, root('Tasks'))
	FOR XML PATH('Package'), TYPE,ROOT('Packages')
<Packages>
  <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
    <Tasks>
      <Dataflow Name="DFT Load File">
        <Transformations>
          <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
            <DirectInput>SELECT * FROM dbo.Orders</DirectInput>
          </OleDbSource>
          <FlatFileDestination Name="FF_DEST Orders" ConnectionName="conn_ff_Orders" Overwrite="true" ValidateExternalMetadata="false" />
        </Transformations>
      </Dataflow>
    </Tasks>
  </Package>
</Packages>

Now let’s tackle the most difficult section of our code, the File Format and its columns. This time, let’s start with our lowest level, columns. Notice how I derive the Delimiter from the ColumnOrdinal column:

SELECT [Column Name] "@Name"
		,[Biml Data Type] "@DataType"
		,Length "@Length"
		,CASE WHEN ColumnOrdinal = (SELECT MAX(ColumnOrdinal) FROM #FileFormatMetadata) THEN 'CRLF'
		ELSE 'Comma' END "@Delimiter" 
		from #FileFormatMetadata
		FOR XML PATH('Column'), type, root('Columns')
<Columns>
  <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma" />
  <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma" />
  <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma" />
  <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma" />
  <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma" />
  <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma" />
  <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma" />
  <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma" />
  <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma" />
  <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma" />
  <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma" />
  <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma" />
  <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma" />
  <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF" />
</Columns>

The File Format is easy. However, I couldn’t find a way to generate XML with T-SQL and keep the TextQualifier &quot; value so I substituted the hex value for double quotes instead.

SELECT 'fff_' + @FeedName "@Name"
	, 'Delimited' "@FlatFileType"
	,'true' "@ColumnNamesInFirstDataRow"
	,'_x0022_' "@TextQualifier"
	FOR XML PATH('FlatFileFormat'), type, root('FileFormats')
<FileFormats>
  <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited" ColumnNamesInFirstDataRow="true" TextQualifier="_x0022_" />
</FileFormats>

And of course Columns needs to be nested inside of FlatFileFormat, so we use a subquery:

SELECT 'fff_' + @FeedName "@Name"
	, 'Delimited' "@FlatFileType"
	,'true' "@ColumnNamesInFirstDataRow"
	,'_x0022_' "@TextQualifier"
		,(SELECT [Column Name] "@Name"
		,[Biml Data Type] "@DataType"
		,Length "@Length"
		,CASE WHEN ColumnOrdinal = (SELECT MAX(ColumnOrdinal) FROM #FileFormatMetadata) THEN 'CRLF'
		ELSE 'Comma' END "@Delimiter" 
		from #FileFormatMetadata
		FOR XML PATH('Column'), type, root('Columns'))
	FOR XML PATH('FlatFileFormat'), type, root('FileFormats')
<FileFormats>
  <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited" ColumnNamesInFirstDataRow="true" TextQualifier="_x0022_">
    <Columns>
      <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma" />
      <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma" />
      <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma" />
      <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma" />
      <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma" />
      <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma" />
      <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma" />
      <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma" />
      <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma" />
      <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma" />
      <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma" />
      <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma" />
      <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma" />
      <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF" />
    </Columns>
  </FlatFileFormat>
</FileFormats>

So now we have three separate pieces of XML code Connections, Packages, and FileFormats. To combine them, we make each section a sub-query of our root element, which is Biml.

   SELECT
   (SELECT 'conn_ff_' + @FeedName "@Name"
	,'C:\FileExport\' + @FeedName + '.csv' "@FilePath"
	,'fff_' + @FeedName "@FileFormat"
    FOR XML PATH('FlatFileConnection'), type, root('Connections'))
	,(SELECT 'Export ' + @FeedName "@Name"
	, 'Linear' "@ConstraintMode"
	,'DontSaveSensitive' "@ProtectionLevel"
		,(SELECT	
		'DFT Load File' "@Name"
		,'OLE_SRC ' + @FeedName "Transformations/OleDbSource/@Name"
		,'connNorthwind' "Transformations/OleDbSource/@ConnectionName"
		,@SrcQuery "Transformations/OleDbSource/DirectInput"
		,'FF_DEST ' + @FeedName "Transformations/FlatFileDestination/@Name"
		,'conn_ff_ ' + @FeedName "Transformations/FlatFileDestination/@ConnectionName"
		,'true'  "Transformations/FlatFileDestination/@Overwrite"
		,'false'  "Transformations/FlatFileDestination/@ValidateExternalMetadata"
		FOR XML PATH('Dataflow'),TYPE, root('Tasks'))
	FOR XML PATH('Package'), TYPE,ROOT('Packages'))
	,(SELECT 'fff_' + @FeedName "@Name"
	, 'Delimited' "@FlatFileType"
	,'true' "@ColumnNamesInFirstDataRow"
	,'_x0022_' "@TextQualifier"
		,(SELECT [Column Name] "@Name"
		,[Biml Data Type] "@DataType"
		,Length "@Length"
		,CASE WHEN ColumnOrdinal = (SELECT MAX(ColumnOrdinal) FROM #FileFormatMetadata) THEN 'CRLF'
		ELSE 'Comma' END "@Delimiter" 
		from #FileFormatMetadata
		FOR XML PATH('Column'), type, root('Columns'))
	FOR XML PATH('FlatFileFormat'), type, root('FileFormats'))
FOR XML PATH ('Biml')

And this creates what we need except for one little thing. Notice that the opening line is missing the Biml namespace?

<Biml>
  <Connections>
    <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders" />
  </Connections>

T-SQL natively supports namespaces by using the WITH XMLNAMESPACES clause, but there is a problem with that. If we use that method, the Biml namespace will be repeated for each sub-query, which is technically correct but really ugly.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections xmlns="http://schemas.varigence.com/biml.xsd">
    <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders" />
  </Connections>
  <Packages xmlns="http://schemas.varigence.com/biml.xsd">
    <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">

According to Microsoft, the code is functioning as designed and they have no plans to fix it. I’ve explored several ways to “hack” this, and the simplest is to take our query result, cast it as NVARCHAR, perform a replace, and then cast it back into a XML data type. So, our final query looks like this:

USE Northwind
GO

DECLARE @SrcQuery NVARCHAR(4000)
,@FeedName NVARCHAR(25)
;
SELECT @SrcQuery = 'SELECT * FROM dbo.Orders', @FeedName = 'Orders';
--SET @SrcQuery = 'SELECT * FROM dbo.Customers';


--EXECUTE sp_executesql @SrcQuery;
IF object_id('tempdb..#FileFormatMetadata') IS NOT NULL DROP TABLE #FileFormatMetadata;

SELECT 
 t.[Column Name]
,t.[SQL DT]
,CASE 
WHEN t.[SQL DT] in ('char', 'varchar','text') THEN (t.max_length)
WHEN t.[SQL DT] in ('nchar', 'nvarchar','ntext') THEN (t.max_length/2)
ELSE 0 END [Length]
,CASE 
WHEN t.[SQL DT] = 'smallint' THEN 'Int16'
WHEN t.[SQL DT] = 'tinyint' THEN 'Byte'
WHEN t.[SQL DT] = 'int' THEN 'Int32'
WHEN t.[SQL DT] = 'bigint' THEN 'Int64'
WHEN t.[SQL DT] = 'char' THEN 'AnsiStringFixedLength'
WHEN t.[SQL DT] = 'varchar' THEN 'AnsiString'
WHEN t.[SQL DT] = 'nchar' THEN 'StringFixedLength'
WHEN t.[SQL DT] = 'nvarchar' THEN 'String'
WHEN t.[SQL DT] = 'date' THEN 'Date'
WHEN t.[SQL DT] = 'datetime' THEN 'DateTime'
WHEN t.[SQL DT] = 'datetime2' THEN 'DateTime2'
WHEN t.[SQL DT] = 'datetimeoffset' THEN 'DateTimeOffset'
WHEN t.[SQL DT] = 'smalldatetime' THEN 'DateTime'
WHEN t.[SQL DT] = 'time' THEN 'Time'
WHEN t.[SQL DT] = 'decimal' THEN 'Decimal'
WHEN t.[SQL DT] = 'numeric' THEN 'Decimal'
WHEN t.[SQL DT] = 'bit' THEN 'Boolean'
WHEN t.[SQL DT] = 'float' THEN 'Double'
WHEN t.[SQL DT] = 'real' THEN 'Single'
WHEN t.[SQL DT] = 'money' THEN 'Currency'
WHEN t.[SQL DT] = 'smallmoney' THEN 'Currency'
WHEN t.[SQL DT] = 'text' THEN 'AnsiString'
WHEN t.[SQL DT] = 'ntext' THEN 'String'
ELSE NULL END [Biml Data Type]
,t.ColumnOrdinal
,@FeedName FeedName
INTO #FileFormatMetadata 
FROM (SELECT name [Column Name]
 ,CASE WHEN charindex('(',system_type_name) = 0 THEN system_type_name
  ELSE left(system_type_name, charindex('(',system_type_name) - 1)
  END [SQL DT]
 ,max_length
 ,column_ordinal ColumnOrdinal
   FROM sys.dm_exec_describe_first_result_set(@SrcQuery, NULL, 1) where [name] is not NULL  and is_hidden = 0) t

SELECT CAST(REPLACE(CAST( (
   SELECT
   (SELECT 'conn_ff_' + @FeedName "@Name"
	,'C:\FileExport\' + @FeedName + '.csv' "@FilePath"
	,'fff_' + @FeedName "@FileFormat"
    FOR XML PATH('FlatFileConnection'), type, root('Connections'))
	,(SELECT 'Export ' + @FeedName "@Name"
	, 'Linear' "@ConstraintMode"
	,'DontSaveSensitive' "@ProtectionLevel"
		,(SELECT	
		'DFT Load File' "@Name"
		,'OLE_SRC ' + @FeedName "Transformations/OleDbSource/@Name"
		,'connNorthwind' "Transformations/OleDbSource/@ConnectionName"
		,@SrcQuery "Transformations/OleDbSource/DirectInput"
		,'FF_DEST ' + @FeedName "Transformations/FlatFileDestination/@Name"
		,'conn_ff_ ' + @FeedName "Transformations/FlatFileDestination/@ConnectionName"
		,'true'  "Transformations/FlatFileDestination/@Overwrite"
		,'false'  "Transformations/FlatFileDestination/@ValidateExternalMetadata"
		FOR XML PATH('Dataflow'),TYPE, root('Tasks'))
	FOR XML PATH('Package'), TYPE,ROOT('Packages'))
	,(SELECT 'fff_' + @FeedName "@Name"
	, 'Delimited' "@FlatFileType"
	,'true' "@ColumnNamesInFirstDataRow"
	,'_x0022_' "@TextQualifier"
		,(SELECT [Column Name] "@Name"
		,[Biml Data Type] "@DataType"
		,Length "@Length"
		,CASE WHEN ColumnOrdinal = (SELECT MAX(ColumnOrdinal) FROM #FileFormatMetadata) THEN 'CRLF'
		ELSE 'Comma' END "@Delimiter" 
		from #FileFormatMetadata
		FOR XML PATH('Column'), type, root('Columns'))
	FOR XML PATH('FlatFileFormat'), type, root('FileFormats'))
FOR XML PATH ('Biml')
) AS NVARCHAR(MAX)), '<Biml>', '<Biml xmlns="http://schemas.varigence.com/biml.xsd">') AS XML)

And this finally produces the Biml we want.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders" />
  </Connections>
  <Packages>
    <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive">
      <Tasks>
        <Dataflow Name="DFT Load File">
          <Transformations>
            <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
              <DirectInput>SELECT * FROM dbo.Orders</DirectInput>
            </OleDbSource>
            <FlatFileDestination Name="FF_DEST Orders" ConnectionName="conn_ff_ Orders" Overwrite="true" ValidateExternalMetadata="false" />
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
  <FileFormats>
    <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited" ColumnNamesInFirstDataRow="true" TextQualifier="_x0022_">
      <Columns>
        <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma" />
        <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma" />
        <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma" />
        <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma" />
        <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma" />
        <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma" />
        <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma" />
        <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma" />
        <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma" />
        <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma" />
        <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma" />
        <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma" />
        <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma" />
        <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF" />
      </Columns>
    </FlatFileFormat>
  </FileFormats>
</Biml>

Up until this point, we’ve been Scripting Biml using shortcuts. In the next article, we’ll finally start using C# to create BimlScript. Stay Tuned!

The T-SQL file used for this post can be downloaded here.

Generating Biml Scripts Using Excel

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. In the previous article, I showed you how to simplify Biml code that was imported using BimlExpress or BimlStudio. All of the code used in this post can be downloaded here.

In this article, we’ll use Excel to dynamically generate the Biml code. First a disclaimer, I am not an Excel expert, I just know enough to be dangerous. This workbook will accomplish what we want, but is probably not the most elegant way to do it. To refresh your memory, this is the Biml from the previous article that we need to automate:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders"/>
    </Connections>
    <Packages>
        <Package Name="Export Orders" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">
            <Tasks>
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
                            <DirectInput>
SELECT * FROM dbo.Orders
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST Orders"  ConnectionName="conn_ff_Orders" Overwrite="true" ValidateExternalMetadata="false"/>
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
    <FileFormats>
        <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited"  ColumnNamesInFirstDataRow="true" TextQualifier=""">
            <Columns>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/> 
                <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/> 
                <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/> 
                <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

We need a way to way to derive the metadata from a SQL query. SQL Server 2012 included some handy new metadata discovery features, and we’re going to use sys.dm_exec_describe_first_result_set. The primary attributes we need to derive are column name, SQL data type, and length. So, we’ll start with the following query and result set:

USE Northwind
GO

DECLARE @SrcQuery NVARCHAR(4000);
SET @SrcQuery = 'SELECT * FROM dbo.Orders';

SELECT name [Column Name]
 ,system_type_name 
 ,max_length 
 FROM sys.dm_exec_describe_first_result_set(@SrcQuery, NULL, 1) 
 WHERE [name] is not NULL  and is_hidden = 0
Column Namesystem_type_namemax_length
OrderIDint4
CustomerIDnchar(5)10
EmployeeIDint4
OrderDatedatetime8
RequiredDatedatetime8
ShippedDatedatetime8
ShipViaint4
Freightmoney8
ShipNamenvarchar(40)80
ShipAddressnvarchar(60)120
ShipCitynvarchar(15)30
ShipRegionnvarchar(15)30
ShipPostalCodenvarchar(10)20
ShipCountrynvarchar(15)30

The next thing we need to do is to derive the base SQL data type from system_type_name. In other words, we need to strip off any length, precision, or scale values from it. We could do this in Excel, but I prefer to do it in T-SQL.

DECLARE @SrcQuery NVARCHAR(4000);
SET @SrcQuery = 'SELECT * FROM dbo.Orders';
 
SELECT name [Column Name]
 ,CASE WHEN charindex('(',system_type_name) = 0 THEN system_type_name
  ELSE left(system_type_name, charindex('(',system_type_name) - 1)
  END [SQL DT]
 ,max_length 
  FROM sys.dm_exec_describe_first_result_set(@SrcQuery, NULL, 1) where [name] is not NULL  and is_hidden = 0
Column NameSQL DTmax_length
OrderIDint4
CustomerIDnchar10
EmployeeIDint4
OrderDatedatetime8
RequiredDatedatetime8
ShippedDatedatetime8
ShipViaint4
Freightmoney8
ShipNamenvarchar80
ShipAddressnvarchar120
ShipCitynvarchar30
ShipRegionnvarchar30
ShipPostalCodenvarchar20
ShipCountrynvarchar30

Finally, we need to convert max_length to an appropriate value for Biml Length. The Biml length for Unicode strings must be half the max_length, ANSI strings like char or varchar use max_length, and other data types will be set to 0. To make this easier, I’ll wrap the above code into a subquery like this:

DECLARE @SrcQuery NVARCHAR(4000);
SET @SrcQuery = 'SELECT * FROM dbo.Orders';

SELECT 
 t.[Column Name]
,t.[SQL DT]
,CASE 
WHEN t.[SQL DT] in ('char', 'varchar','text') THEN (t.max_length)
WHEN t.[SQL DT] in ('nchar', 'nvarchar','ntext') THEN (t.max_length/2)
ELSE 0 END [Length]
FROM (SELECT name [Column Name]
 ,CASE WHEN charindex('(',system_type_name) = 0 THEN system_type_name
  ELSE left(system_type_name, charindex('(',system_type_name) - 1)
  END [SQL DT]
 ,max_length
   FROM sys.dm_exec_describe_first_result_set(@SrcQuery, NULL, 1) where [name] is not NULL  and is_hidden = 0) t
Column NameSQL DTLength
OrderIDint0
CustomerIDnchar5
EmployeeIDint0
OrderDatedatetime0
RequiredDatedatetime0
ShippedDatedatetime0
ShipViaint0
Freightmoney0
ShipNamenvarchar40
ShipAddressnvarchar60
ShipCitynvarchar15
ShipRegionnvarchar15
ShipPostalCodenvarchar10
ShipCountrynvarchar15

The rest of our work will be in Excel, so please download the solution files so you can follow along. Open Delimited Feed Metadata Generator.xlsx and you’ll find two sheets included. The first contains a conversion table of SQL Server, Biml, and SSIS data types. Cathrine Wilhelmsen has a comprehensive blog article about data type conversion. Below are the contents of my Excel lookup table.

SQL ServerBimlSSIS
tinyintByteDT_UI1
smallintInt16DT_I2
intInt32DT_I4
bigintInt64DT_I8
charAnsiStringFixedLengthDT_STR
varcharAnsiStringDT_STR
ncharStringFixedLengthDT_WSTR
nvarcharStringDT_WSTR
dateDateDT_DBDATE
datetimeDateTimeDT_DBTIMESTAMP
datetime2DateTime2DT_DBTIMESTAMP2
datetimeoffsetDateTimeOffsetDT_DBTIMESTAMPOFFSET
smalldatetimeDateTimeDT_DBTIMESTAMP
timeTimeDT_DBTIME2
decimalDecimalDT_NUMERIC
numericDecimalDT_NUMERIC
bitBooleanDT_BOOL
floatDoubleDT_R8
realSingleDT_R4
moneyCurrencyDT_CY
smallmoneyCurrencyDT_CY
textAnsiStringDT_TEXT
ntextStringDT_NTEXT
timestampBinaryDT_BYTES
uniqueidentifierGuidDT_GUID
varbinaryBinaryDT_BYTES
xmlXmlDT_NTEXT
geographyObjectDT_IMAGE
geometryObjectDT_IMAGE
hierarchyidObjectDT_BYTES
imageBinaryDT_IMAGE
rowversionBinaryDT_BYTES
sql_variantObjectDT_WSTR

The second sheet is the template that we’ll use to create our feed Biml itself. Use the following steps to create your first export:

  • Make a copy of the FF Export Template and rename the sheet tab to “Orders”. Notice the Feed Name in cell B1 defaults to the tab name.
  • Enter the source query we’ve been using (SELECT * FROM dbo.Orders) into the shaded blue cell B2. Notice that the Biml in column G begins to generate.
  • Execute the latest metadata query against Northwind in SQL Server Management Studio.
  • Copy and paste the results into the blue shaded area starting on row 5.
  • Notice that the remaining Biml code for the Export Orders package is expanded for you in column G.
  • Open the included Visual Studio SSIS project and create a new Biml file. You can do so through the BimlExpress menu or Right Click on the project name in the Solution Explorer and select “Add New Biml File.”
  • Copy and paste the contents of column G into the new Biml file. Multi-select the new Biml file and Environment.Biml in the Solution Explorer, right click, and select Generate SSIS Packages. Your new package should generate and function correctly.

This is the generated Biml from the spreadsheet. Note that it looks identical to the Biml from the previous article.

<!-- Orders.biml -->
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <FlatFileConnection Name="conn_ff_Orders" FilePath="C:\FileExport\Orders.csv" FileFormat="fff_Orders"/>
    </Connections>
    <Packages>
        <Package Name="Export Orders" ConstraintMode="Linear"  ProtectionLevel="DontSaveSensitive" PackageSubpath="Export Flat Files">
            <Tasks>
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC Orders" ConnectionName="connNorthwind">
                            <DirectInput>
                                 SELECT * FROM dbo.Orders
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST Orders"  ConnectionName="conn_ff_Orders" Overwrite="true" ValidateExternalMetadata="false"/> 
                    </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
    <FileFormats>
        <FlatFileFormat Name="fff_Orders" FlatFileType="Delimited"  ColumnNamesInFirstDataRow="true" TextQualifier=""">
            <Columns>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/> 
                <Column Name="EmployeeID" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="OrderDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="RequiredDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShippedDate" DataType="DateTime" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipVia" DataType="Int32" Length="0" Delimiter="Comma"/> 
                <Column Name="Freight" DataType="Currency" Length="0" Delimiter="Comma"/> 
                <Column Name="ShipName" DataType="String" Length="40" Delimiter="Comma"/> 
                <Column Name="ShipAddress" DataType="String" Length="60" Delimiter="Comma"/> 
                <Column Name="ShipCity" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipRegion" DataType="String" Length="15" Delimiter="Comma"/> 
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

This is simple, right? More importantly, it’s very quick to create an endless number of export feeds by repeating the steps above.

Now let’s explore some of the Excel code which generates the Biml. The first calculated column (D) converts the base SQL data type to Biml using a standard VLookup. Since file exports have a variable number of columns, any blank spreadsheet row returns nothing.

=IF(B5 <> "",VLOOKUP(B5,BimlDT,2,FALSE),"")

The next column determines whether a Comma or CRLF delimiter is appropriate. Basically, the code checks to see if a Biml Data Type has been calculated on it’s row and at the cell beneath it. This is a simple way to determine if the feed column is the last in the feed.

=IF(D5 <> "", IF(E6 <> "", "Comma","CRLF"),"")

The actual column Biml creation begins in F24. The reason for this is that the “final” Biml in column G has to specify the Connection and Package code before we can tackle the FlatFileFormat details. So, row 24 generates from the column metadata in column 5. Remember that the column code ends up looking like this:

            <Columns>
                <Column Name="OrderID" DataType="Int32" Length="0" Delimiter="Comma"/>
                <Column Name="CustomerID" DataType="StringFixedLength" Length="5" Delimiter="Comma"/>

Notice that I’m actually indenting final code because I use this spreadsheet in my presentations as well and want it to look nice. This, coupled with the conditional display logic, makes the code a bit more difficult to follow. Since “scripting Biml” is meant to be a temporary short cut on your way to Biml mastery, feel free to skip the aesthetic aspects and just generate Biml code. Also, the double quote character is reserved in the Excel formula language and must be escaped with more quotes or by using CHAR(34). With that in mind, this is what the column generation code looks like.

=IF(D5 <> "", CONCAT("                <Column Name=""", A5,""" DataType=""", D5,""" Length=""", C5,""" Delimiter=""", E5,"""/> "), "")

Now let’s look at column G. Basically, the reason that this column exists, is to prettify the generated Biml and make it easier to use. This sheet currently supports fixed width feeds of up to 100 columns, but most feeds don’t require that many. Instead of having a large set of blank cells between the last column line and </Columns>, this code detects the last feed column required and then adds the final four statements. Again, you may want to skip this level of detail when creating your own Excel shortcuts to make things easier. The important code begins in cell G24. Through a series of IF statements it looks at column F to determine when the Biml column metadata is finished and then applies the correct closing snippets in the next four rows.

=IF(F24<>"",F24,IF(F23<>"", "            </Columns>", IF(F22<>"", "        </FlatFileFormat>", IF(F21<>"", "    </FileFormats>", IF(F20<>"", "</Biml>", "")))))
                <Column Name="ShipPostalCode" DataType="String" Length="10" Delimiter="Comma"/> 
                <Column Name="ShipCountry" DataType="String" Length="15" Delimiter="CRLF"/> 
            </Columns>
        </FlatFileFormat>
    </FileFormats>
</Biml>

If desired, we could continue refining this process until it’s almost entirely automated. For example, we could create a Tab containing connection information including the connection string. Add a Source Connection value in B3, and use Visual Basic for Applications to execute the metadata query above against the source and automatically populate column metadata in columns A, B, and C. This VBA macro could be tied to a button for ease of use. You could even automate the creation of the individual Biml files on disk as well. How far you want to go with this is up to you.

Feel free to create addition flat file exports from the Northwind database by following the steps above. In the next article, we’ll refine the metadata query above and generate the entire Biml script using only T-SQL.

Stay Tuned!

All of the code used in this post can be downloaded here.

My Wife and I are Presenting at SQL Saturday Dallas on June 1st!

It might surprise some of you that my lovely wife Melissa is an ETL developer as well, and she’ll be giving her first presentation at SQL Saturday Dallas.

My Better Half

She will be giving the following presentation, while I act as her lovely assistant.

The True Story of Learning Biml

How do you learn Biml when you don’t have years of SQL experience? Is Biml even worth my time and energy when I’m still working on perfecting my SSIS skills? It seems like a lot of work, what is in it for me? In this session you will learn:

  • Why Biml is worth your time and effort.
  • How to identify ideal SSIS packages for conversion.
  • How to convert those packages into Biml.
  • How to troubleshoot your Biml build errors when Google fails you.
  • How to automate SSIS package creation with BimlScript.

Download Presentation Files

I can’t let her have all the fun, so I’m presenting an updated version of the following:

Azure Data Factory V2 and SSIS Automation With Biml

SSIS was originally designed to work in a world where all data was structured and on premise. As modern enterprise data projects increasingly move online, tools like Azure Data Factory (ADF) can extend SSIS capabilities to the cloud. Azure Data Factory is a flexible and powerful cloud based ETL service, but it doesn’t replace the functionality of SSIS. In this session you will learn:

  • The advantages and disadvantages of both ETL platforms.
  • How to use both in tandem to maximize productivity.
  • How both ETL platforms, as well as any supporting structures, can be be automatically generated with Biml Script.

All of the code used in this presentation can be downloaded by attendees, customized, and used in their production environments.

Anyway, it’ll be a great time, and I can’t wait to see some of you there!

Download True Story of Biml Presentation Files