Archives

Import Biml Metadata from Excel Advanced

This is the final post of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. Our mission for this series was to quickly create many Flat File extracts from provided source queries using SQL Server Integration Services.

Make sure you read and understand the concepts in each of the following articles before tackling this one.

One of the primary issues I’ve mentioned in each of the preceding short cut steps is a lack of real automation. Since we didn’t have our metadata externalized, we needed to copy/paste some object, T-SQL file, Biml file, or Excel Spreadsheet in order to create new extracts.

In this framework, I’ve externalized the metadata in an Excel worksheet which looks like this:

Feed NameSource QueryFile ExtensionFirst Row Column NamesHeader Row DelimiterColumn Row DelimiterColumn DelimiterText QualifierGenerate
OrderDetailsSELECT OrderID, ProductID, ProductName, UnitPrice, Quantity, Discount, ExtendedPrice FROM dbo.vwOrderDetailscsvYLFLFComma"Y
OrdersSELECT * FROM dbo.OrderstxtNCRLFCRLFTab~Y
ProductDetailsSELECT CategoryName, ProductID, ProductName, QuantityPerUnit, UnitPrice, UnitsInStock FROM dbo.vwProductDetails
WHERE Discontinued = 0
ORDER BY CategoryName, ProductName;
mehYCRLFCRLFVerticalBarY
CustomersSelect * From dbo.CustomerscsvYCRLFCRLFSemicolon"Y
SuppliersSELECT * FROM dbo.SuppliersedfNCRLFCRLFComma*Y
EmployeeDetailsSELECT E.EmployeeID, E.LastName, E.FirstName, E.Title, E.HireDate, E.City, E.PostalCode, E.Country, E.Extension, T.TerritoryDescription
FROM dbo.Employees E
JOIN dbo.EmployeeTerritories ET ON ET.EmployeeID=E.EmployeeID
JOIN dbo.Territories T ON T.TerritoryID=ET.TerritoryID;
csvYCRLFCRLFComma"Y
CustomerOrdersDetailEXECUTE [dbo].[CustOrdersDetail] @OrderID = 11077csvYCRLFCRLFComma"Y

The template is simple enough for most people to fill it out. All of the columns to the right of Source Query default to the original file specification we were given: comma-separated values (CSV) format, double quote text qualified.

Let’s Look at Code

This code is based on my previous example, so I won’t go into all the gory details. If you have code related questions, review the previous article for the specifics. Let’s start with the entire script, and then we’ll examine each section.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>
<#
  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\"; 
  string mdFileName = "Delimitted Feed Metadata.xlsx";
  string mdWorkSheetName = "Delimited Export List Advanced";

  int r; // Row Number

  var srcConn = (AstDbConnectionNode)RootNode.Connections["connNorthwind" ]; 
  Varigence.Languages.Biml.Table.AstTableNode feedMd;
#>

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}
foreach (string fil in filePaths)
{
	Workbook wb = null;
	try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets[mdWorkSheetName];  
		r = 2; // First Row for Import
		System.Data.DataTable dtFlatFileExtract = new System.Data.DataTable(); 
		dtFlatFileExtract.Columns.Add("FeedName");
		dtFlatFileExtract.Columns.Add("SourceQuery");
		dtFlatFileExtract.Columns.Add("FileExtension");
		dtFlatFileExtract.Columns.Add("FirstRowColumnNames");
		dtFlatFileExtract.Columns.Add("HeaderRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnDelimiter");
		dtFlatFileExtract.Columns.Add("TextQualifier");
		dtFlatFileExtract.Columns.Add("Generate");	
		
		string feedName = GetCellValue(ws, 1,r);
		//string generate = "Y";
		string generate = GetCellValue(ws, 9,r);

		while (!string.IsNullOrWhiteSpace(feedName)) {
			if (generate.ToUpper() == "Y") {
				dtFlatFileExtract.Rows.Add(
				feedName
				,GetCellValue(ws, 2,r)
				,GetCellValue(ws, 3,r)
				,GetCellValue(ws, 4,r)
				,GetCellValue(ws, 5,r)
				,GetCellValue(ws, 6,r)
				,GetCellValue(ws, 7,r)
				,GetCellValue(ws, 8,r)
				,generate				
				);
			}
		r++;
		feedName = GetCellValue(ws, 1,r);
		generate = GetCellValue(ws, 9,r);
		}			
			if (dtFlatFileExtract.Rows.Count == 0) 
			{
				throw new Exception("No Delimited Exports were found!");
            }
			#>
	<Connections>
	<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
		<FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.<#=dr["FileExtension"]#>" FileFormat="fff_<#=feedName#>"/>		
		<# } #>
    </Connections>
	<Packages>
			<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
        <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey" PackageSubpath="Export Flat Files">
            <Tasks>                
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind">
                        	<DirectInput>
								<#=sourceQuery #>
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST <#=feedName#>"  ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/> 
                   </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
		<# } #>		
    </Packages>	
	<FileFormats> 	
		<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString();
			try  /*Validate Source Query*/
			{
			feedMd = ExternalDataAccess.GetQuerySchema(srcConn, sourceQuery,null,0 );
			}
			catch (Exception ex1)
			{
			string errorMessage = "Feed Name: " +	dr["FeedName"].ToString() + " CANNOT be evaluated. Error Message: " + ex1.ToString().Split(new [] { '\r', '\n' }).FirstOrDefault();
					throw new Exception(errorMessage);		
			}	
		#>	
		<FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited" ColumnNamesInFirstDataRow="<#=((dr["FirstRowColumnNames"].ToString() == "Y" ) ? "true" : "false") #>" TextQualifier="<#=dr["TextQualifier"].ToString().XmlEscape() #>" HeaderRowDelimiter="<#=dr["HeaderRowDelimiter"].ToString().XmlEscape() #>">
			<Columns>		
		<# foreach (var column in feedMd.Columns) {#>
				<Column Name="<#=column.Name #>" DataType="<#=column.DataType #>" Length="<#=column.Length #>" Delimiter="<#=(column == feedMd.Columns.Last()) ? dr["ColumnRowDelimiter"].ToString().XmlEscape()  : dr["ColumnDelimiter"].ToString().XmlEscape()#>"/> 
		<# } #>	
			</Columns>
		</FlatFileFormat>
		<# } #>
	</FileFormats>
<#
}

catch (Exception ex)
	{
		throw new Exception(ex.ToString());
	}
}	
#>
<# 
xl.Quit();
#>
</Biml>

<#+
public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
{
	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
return value == null ? string.Empty : value.ToString().Trim()  ;
}
#>

Once again, we’ll start from the top with our namespaces and assemblies which are required for file system access and Excel functionality.

<#@ import namespace="System.IO" #>
<#@ import namespace="System.Data"#>
<#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#>
<#@ import namespace="Microsoft.Office.Interop.Excel" #>

Lines 6 through 9 specify the workbook location and worksheet name we’ll be importing from.

  string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
  userFolder = userFolder +  @"\Dropbox\Blog\Biml\Import Metadata from Excel\"; 
  string mdFileName = "Delimitted Feed Metadata.xlsx";
  string mdWorkSheetName = "Delimited Export List Advanced";

In the next section, we’ll declare a row number variable which keeps track of which row we are importing from the worksheet. Since this framework takes the provided queries and extracts the metadata from them, I need to define the source connection. I’ve also declared a Biml table variable, that we’ll use to store the FlatFileFormat column metadata.

  int r; // Row Number
 
  var srcConn = (AstDbConnectionNode)RootNode.Connections["connNorthwind" ]; 
  Varigence.Languages.Biml.Table.AstTableNode feedMd;

The next section looks for the file(s) on disk, and then loops through any it finds, in this case just one.

<# 
var xl = new Application();
string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories);

if (filePaths.Count() == 0) {
		throw new Exception("No Import document found for: " + mdFileName ) ;
}
foreach (string fil in filePaths)
{
	Workbook wb = null;

We then try to open the workbook and worksheet, and assign the first import row as 2.

	try 
	{
		wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false);
		Worksheet ws = (Worksheet)xl.Sheets[mdWorkSheetName];  
		r = 2; // First Row for Import

Next I create a DataTable called dtFlatFileExtract to hold the metadata, note that the column names correspond to the worksheet.

		System.Data.DataTable dtFlatFileExtract = new System.Data.DataTable(); 
		dtFlatFileExtract.Columns.Add("FeedName");
		dtFlatFileExtract.Columns.Add("SourceQuery");
		dtFlatFileExtract.Columns.Add("FileExtension");
		dtFlatFileExtract.Columns.Add("FirstRowColumnNames");
		dtFlatFileExtract.Columns.Add("HeaderRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnRowDelimiter");
		dtFlatFileExtract.Columns.Add("ColumnDelimiter");
		dtFlatFileExtract.Columns.Add("TextQualifier");
		dtFlatFileExtract.Columns.Add("Generate");	

Next we’ll store the Feed Name and Generate values from the 2nd row to variables. The while loop which follows adds the values for the specified row (r) of the spreadsheet to the DataTable defined above. After each loop, the row number is incremented. This causes the process to move down the spreadsheet until the first blank row is found, which stops the import process.

		string feedName = GetCellValue(ws, 1,r);
		//string generate = "Y";
		string generate = GetCellValue(ws, 9,r);

		while (!string.IsNullOrWhiteSpace(feedName)) {
			if (generate.ToUpper() == "Y") {
				dtFlatFileExtract.Rows.Add(
				feedName
				,GetCellValue(ws, 2,r)
				,GetCellValue(ws, 3,r)
				,GetCellValue(ws, 4,r)
				,GetCellValue(ws, 5,r)
				,GetCellValue(ws, 6,r)
				,GetCellValue(ws, 7,r)
				,GetCellValue(ws, 8,r)
				,generate				
				);
			}
		r++;
		feedName = GetCellValue(ws, 1,r);
		generate = GetCellValue(ws, 9,r);
		}	

The next section throws an error if feed metadata is not found.

			if (dtFlatFileExtract.Rows.Count == 0) 
			{
				throw new Exception("No Delimited Exports were found!");
            }

Next we simply loop through Connections and then Packages, which require only the Feed Name and Source Query variables.

	<Connections>
	<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
		<FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.<#=dr["FileExtension"]#>" FileFormat="fff_<#=feedName#>"/>		
		<# } #>
    </Connections>
	<Packages>
			<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString(); #>
        <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey" PackageSubpath="Export Flat Files">
            <Tasks>                
                <Dataflow Name="DFT Load file" >
                    <Transformations>
                        <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind">
                        	<DirectInput>
								<#=sourceQuery #>
                            </DirectInput>
                        </OleDbSource>
                        <FlatFileDestination Name="FF_DEST <#=feedName#>"  ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/> 
                   </Transformations>
                </Dataflow>
            </Tasks>
        </Package>
		<# } #>		
    </Packages>	

As I mentioned in previous articles, the challenging part of this framework is creating the FlatFileFormat, particularly the columns. To begin, we loop through each DataRow and import the metadata with a Try/Catch. The Catch warns you when your query doesn’t work.

	<FileFormats> 	
		<#foreach (DataRow dr in dtFlatFileExtract.Rows) {
			string sourceQuery = dr["SourceQuery"].ToString();
			feedName = dr["FeedName"].ToString();
			try  /*Validate Source Query*/
			{
			feedMd = ExternalDataAccess.GetQuerySchema(srcConn, sourceQuery,null,0 );
			}
			catch (Exception ex1)
			{
			string errorMessage = "Feed Name: " +	dr["FeedName"].ToString() + " CANNOT be evaluated. Error Message: " + ex1.ToString().Split(new [] { '\r', '\n' }).FirstOrDefault();
					throw new Exception(errorMessage);		
			}	
		#>	

Next we populate the FlatFileFormat and it’s columns.

		<FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited" ColumnNamesInFirstDataRow="<#=((dr["FirstRowColumnNames"].ToString() == "Y" ) ? "true" : "false") #>" TextQualifier="<#=dr["TextQualifier"].ToString().XmlEscape() #>" HeaderRowDelimiter="<#=dr["HeaderRowDelimiter"].ToString().XmlEscape() #>">
			<Columns>		
		<# foreach (var column in feedMd.Columns) {#>
				<Column Name="<#=column.Name #>" DataType="<#=column.DataType #>" Length="<#=column.Length #>" Delimiter="<#=(column == feedMd.Columns.Last()) ? dr["ColumnRowDelimiter"].ToString().XmlEscape()  : dr["ColumnDelimiter"].ToString().XmlEscape()#>"/> 
		<# } #>	
			</Columns>
		</FlatFileFormat>
		<# } #>
	</FileFormats>

Finally we have the Catch statement which corresponds to the original Try (opening the workbook). This is followed by the closing Biml tag and the GetCellValue method.

<#
}

catch (Exception ex)
	{
		throw new Exception(ex.ToString());
	}
}	
#>
<# 
xl.Quit();
#>
</Biml>

<#+
public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row)
{
	var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ;
return value == null ? string.Empty : value.ToString().Trim()  ;
}
#>

Next Steps

In this series we’ve taken several optional shortcuts, each offered a way to learn and use Biml while still getting real work done. Now, we have a fully functioning framework which dynamically creates SSIS export packages based on metadata contained in a spreadsheet. Hopefully this series has inspired you to learn Biml, and create your own frameworks. Be sure to follow this blog because I’ll share more patterns in the near future.

The files used for this post can be downloaded here.

2 comments to Import Biml Metadata from Excel Advanced

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>