Archives

Import Biml Metadata Directly from Excel

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so.

It might surprise some of you that I use Excel extensively for Biml metadata. After all, I’ve ranted about the misuse of Excel in the past. Many of my clients prefer using it over other methods.

Why Use Excel?

Excel Spreadsheets as a metadata source have a lot going for them.

  • Everyone uses Excel and is comfortable with it.
  • Excel is incredibly customizable and versatile.
  • Excel offers data validation and filtering.

For these reasons, I create customized Excel spreadsheets that function as a lite Graphic User Interfaces (GUI) for metadata. Of course, Excel isn’t a perfect metadata source. For one thing, you have to own a licensed copy of Excel. Second, because spreadsheets are so easy to customize, users sometimes “improve” them further and break your code.

While there are other methods for C# importation of Excel spreadsheets, the method I’m going to demonstrate is my favorite. It utilizes the Microsoft Office Interop Assembly for Excel, and requires Excel to be installed on your local machine.

Why use the Interop Excel Assembly?

First, the assembly is extremely flexible. The pattern I’m demonstrating in this post is simple and involves importing from a single sheet. However, it is easy to adapt the process for multiple sheets in the same Excel workbook or to import multiple workbooks at once. In a future article, I’ll demonstrate how I create and load a data warehouse where each worksheet represents a dimension or fact table.

Second, it performs very well. I can import the metadata from 20+ sheets, which loads an entire data warehouse, in under a minute.

Show Me the Code

I’ve always found the easiest way to learn something is to review a working example. This example will read a single sheet from a workbook that contains connection metadata on a tab named “Connections.” The metadata is as follows:

Connection NameDatabase NameCreate In ProjectRetain Same ConnectionConnection String
connNorthwindNorthwindtruefalseProvider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=Northwind
connETLManagerETLManagertruefalseProvider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=ETLManager
connNorthwindODSNorthwindODStruefalseProvider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=NorthwindODS

Unlike previous articles, I’m going to share the entire code now and then discuss the individual pieces below.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>
<#
  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\";
  string mdFileName = "Environment.xlsx";

  int r; // Row Number
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}
foreach (string fil in filePaths)
{
	Workbook wb = null;
	try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets["Connections"];  
		r = 2; // First Row for Import
			System.Data.DataTable dtEnvironment = new System.Data.DataTable(); 
			dtEnvironment.Columns.Add("ConnectionName");
			dtEnvironment.Columns.Add("DatabaseName");
			dtEnvironment.Columns.Add("CreateInProject");
			dtEnvironment.Columns.Add("RetainSameConnection");
			dtEnvironment.Columns.Add("ConnectionString");

			string ConnectionName = GetCellValue(ws,1,r);
			while (!string.IsNullOrWhiteSpace(ConnectionName)) {
			 dtEnvironment.Rows.Add(
				 GetCellValue(ws,1,r)
				,GetCellValue(ws,2,r)
				,GetCellValue(ws,3,r)
				,GetCellValue(ws,4,r)
				,GetCellValue(ws,5,r)
			 );
			r++;
			ConnectionName = GetCellValue(ws, 1,r);
			}
			if (dtEnvironment.Rows.Count == 0) 
			{
				throw new Exception("No Connection Information found on Connections Worksheet");
            }
			#>
	<Connections>
			<# foreach(DataRow drEnvironment in dtEnvironment.Rows) {	#>
		<Connection Name="<#=drEnvironment["ConnectionName"]#>" ConnectionString="<#=drEnvironment["ConnectionString"]#>" CreateInProject="<#=drEnvironment["CreateInProject"]#> " RetainSameConnection="<#=drEnvironment["RetainSameConnection"]#>" >
			<Annotations>
				<Annotation Tag="DatabaseName"><#=drEnvironment["DatabaseName"]#></Annotation>
            </Annotations>			
		</Connection>	
			<# } #>
	</Connections>
<#		
		}
		catch (Exception ex)
			{
				throw new Exception(ex.ToString());
			}
}
xl.Quit();
#>
</Biml>

<#+
  public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
  {
  	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
  return value == null ? string.Empty : value.ToString().Trim();
  }
#>

Lets start at the top. This section imports the required namespaces and assemblies. The first two are necessary to locate the spreadsheet file on disk, while lines 3 and 4 import the Interop assembly.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL" #>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>

Find Your Assembly

In order to reference the assembly, you’ll have to use the fully qualified name from the Global Assembly Cache (GAC). In order to find it, open the Developer Command Prompt for your version of Visual Studio. The easiest way to find the DCP is to press the Windows key and begin typing “Develop….” until the name of the tool pops up. After it opens, paste the following statement into the DCP and press Enter to find the Excel assembly information on your local machine:

gacutil /l Microsoft.Office.Interop.Excel

Which should return something like this:

Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL

Copy and paste that into the assembly name string on line 3 above.

Now let’s jump to the bottom and examine the GetCellValue method. GetCellValue pulls the value out of the cell designated by column and row, converts it to a string, and trims any leading or trailing spaces. It also returns an empty string if the cell is null.

<#+
  public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
  {
  	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
  return value == null ? string.Empty : value.ToString().Trim();
  }
#>

Next we’ll specify what metadata spreadsheet we are importing. I use the Environment variable to get my default folder path, “C:\Users\david”, and then build from there. This helps to abstract the link so the same code should work for multiple developers working on the same project. The variable “r” keeps track of which row in the spreadsheet we are reading.

<#
  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\";
  string mdFileName = "Environment.xlsx";

  int r; // Row Number
#>

This next section looks for the file and returns the entire file path including the name and extension. I realize that the SearchOption.AllDirectories isn’t really needed since I’m only searching for a single file, but I use this same pattern in other situations which import all of the metadata spreadsheets in a particular folder. I’ve also included a check that the file exists at that location.

<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}

Next I loop through all of the files found in the previous step, in this case just one, and create a Workbook object.

foreach (string fil in filePaths)
{
	Workbook wb = null;

Next we try to open the workbook and assign a Worksheet object to our sheet called “Connections.” Also, we set the first row to import to 2, as we want to skip the column headers.

try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets["Connections"];  
		r = 2; // First Row for Import

Next we’ll create a new DataTable called dtEnvironment to store the imported metadata. Notice that the column names correspond to those in the spreadsheet.

			System.Data.DataTable dtEnvironment = new System.Data.DataTable(); 
			dtEnvironment.Columns.Add("ConnectionName");
			dtEnvironment.Columns.Add("DatabaseName");
			dtEnvironment.Columns.Add("CreateInProject");
			dtEnvironment.Columns.Add("RetainSameConnection");
			dtEnvironment.Columns.Add("ConnectionString");

Next we’ll store the ConnectionName value from the 1st column 2nd row to a variable of the same name. The while loop which follows adds the values for the specified row (r) of the spreadsheet to the DataTable defined above. After each loop, the row number is incremented. This causes the process to move down the spreadsheet until the first blank row is found, which stops the import process.

			string ConnectionName = GetCellValue(ws,1,r);
			while (!string.IsNullOrWhiteSpace(ConnectionName)) {
			 dtEnvironment.Rows.Add(
				 GetCellValue(ws,1,r)
				,GetCellValue(ws,2,r)
				,GetCellValue(ws,3,r)
				,GetCellValue(ws,4,r)
				,GetCellValue(ws,5,r)
			 );
			r++;
			ConnectionName = GetCellValue(ws, 1,r);
			}

The next section checks to make sure that at least one connection was imported.

			if (dtEnvironment.Rows.Count == 0) 
			{
				throw new Exception("No Connection Information found on Connections Worksheet");
            }
			#>

Now that we’ve completely loaded the DataTable, we loop through it to create our Connections. I’ve taken an additional step of adding the DatabaseName as an annotation for each connection. This simplifies BimlScript writing in future steps.

	<Connections>
			<# foreach(DataRow drEnvironment in dtEnvironment.Rows) {	#>
		<Connection Name="<#=drEnvironment["ConnectionName"]#>" ConnectionString="<#=drEnvironment["ConnectionString"]#>" CreateInProject="<#=drEnvironment["CreateInProject"]#> " RetainSameConnection="<#=drEnvironment["RetainSameConnection"]#>" >
			<Annotations>
				<Annotation Tag="DatabaseName"><#=drEnvironment["DatabaseName"]#></Annotation>
            </Annotations>			
		</Connection>	
			<# } #>
	</Connections>

Finally, we have a standard Catch statement and the closing Biml tag.

<#		
		}
		catch (Exception ex)
			{
				throw new Exception(ex.ToString());
			}
}
xl.Quit();
#>
</Biml>

Simple huh? This code produces the following Biml.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<Connection Name="connNorthwind" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=Northwind" CreateInProject="true " RetainSameConnection="false" >
			<Annotations>
				<Annotation Tag="DatabaseName">Northwind</Annotation>
			</Annotations>
		</Connection>
		<Connection Name="connETLManager" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=ETLManager" CreateInProject="true " RetainSameConnection="false" >
			<Annotations>
				<Annotation Tag="DatabaseName">ETLManager</Annotation>
			</Annotations>
		</Connection>
		<Connection Name="connNorthwindODS" ConnectionString="Provider=SQLNCLI11;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=NorthwindODS" CreateInProject="true " RetainSameConnection="false" >
			<Annotations>
				<Annotation Tag="DatabaseName">NorthwindODS</Annotation>
			</Annotations>
		</Connection>
	</Connections>
</Biml>

Summary

Notice that this process is dynamic based on external metadata. When someone adds another connection to the spreadsheet, it will be added to our Biml project automatically on your next build. In the next article, we’ll adapt this method to read SQL queries from a spreadsheet and automagically create an export package for each.

The files used for this post can be downloaded here.

1 comment to Import Biml Metadata Directly from Excel

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>