This is the final post of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. Our mission for this series was to quickly create many Flat File extracts from provided source queries using SQL Server Integration Services.
Make sure you read and understand the concepts in each of the following articles before tackling this one.
- Import Biml Metadata with GetQuerySchema
- BimlScript Code Nuggets and Mad Libs
- Import Biml Metadata Directly from Excel
One of the primary issues I’ve mentioned in each of the preceding short cut steps is a lack of real automation. Since we didn’t have our metadata externalized, we needed to copy/paste some object, T-SQL file, Biml file, or Excel Spreadsheet in order to create new extracts.
In this framework, I’ve externalized the metadata in an Excel worksheet which looks like this:
Feed Name | Source Query | File Extension | First Row Column Names | Header Row Delimiter | Column Row Delimiter | Column Delimiter | Text Qualifier | Generate |
---|---|---|---|---|---|---|---|---|
OrderDetails | SELECT OrderID, ProductID, ProductName, UnitPrice, Quantity, Discount, ExtendedPrice FROM dbo.vwOrderDetails | csv | Y | LF | LF | Comma | " | Y |
Orders | SELECT * FROM dbo.Orders | txt | N | CRLF | CRLF | Tab | ~ | Y |
ProductDetails | SELECT CategoryName, ProductID, ProductName, QuantityPerUnit, UnitPrice, UnitsInStock FROM dbo.vwProductDetails WHERE Discontinued = 0 ORDER BY CategoryName, ProductName; | meh | Y | CRLF | CRLF | VerticalBar | Y | |
Customers | Select * From dbo.Customers | csv | Y | CRLF | CRLF | Semicolon | " | Y |
Suppliers | SELECT * FROM dbo.Suppliers | edf | N | CRLF | CRLF | Comma | * | Y |
EmployeeDetails | SELECT E.EmployeeID, E.LastName, E.FirstName, E.Title, E.HireDate, E.City, E.PostalCode, E.Country, E.Extension, T.TerritoryDescription FROM dbo.Employees E JOIN dbo.EmployeeTerritories ET ON ET.EmployeeID=E.EmployeeID JOIN dbo.Territories T ON T.TerritoryID=ET.TerritoryID; | csv | Y | CRLF | CRLF | Comma | " | Y |
CustomerOrdersDetail | EXECUTE [dbo].[CustOrdersDetail] @OrderID = 11077 | csv | Y | CRLF | CRLF | Comma | " | Y |
The template is simple enough for most people to fill it out. All of the columns to the right of Source Query default to the original file specification we were given: comma-separated values (CSV) format, double quote text qualified.
Let’s Look at Code
This code is based on my previous example, so I won’t go into all the gory details. If you have code related questions, review the previous article for the specifics. Let’s start with the entire script, and then we’ll examine each section.
<#@ import namespace="System.IO" #> <#@ import namespace="System.Data"#> <#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#> <#@ import namespace="Microsoft.Office.Interop.Excel" #> <# string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile); userFolder = userFolder + @"\Dropbox\Blog\Biml\Import Metadata from Excel\"; string mdFileName = "Delimitted Feed Metadata.xlsx"; string mdWorkSheetName = "Delimited Export List Advanced"; int r; // Row Number var srcConn = (AstDbConnectionNode)RootNode.Connections["connNorthwind" ]; Varigence.Languages.Biml.Table.AstTableNode feedMd; #> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <# var xl = new Application(); string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories); if (filePaths.Count() == 0) { throw new Exception("No Import document found for: " + mdFileName ) ; } foreach (string fil in filePaths) { Workbook wb = null; try { wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false); Worksheet ws = (Worksheet)xl.Sheets[mdWorkSheetName]; r = 2; // First Row for Import System.Data.DataTable dtFlatFileExtract = new System.Data.DataTable(); dtFlatFileExtract.Columns.Add("FeedName"); dtFlatFileExtract.Columns.Add("SourceQuery"); dtFlatFileExtract.Columns.Add("FileExtension"); dtFlatFileExtract.Columns.Add("FirstRowColumnNames"); dtFlatFileExtract.Columns.Add("HeaderRowDelimiter"); dtFlatFileExtract.Columns.Add("ColumnRowDelimiter"); dtFlatFileExtract.Columns.Add("ColumnDelimiter"); dtFlatFileExtract.Columns.Add("TextQualifier"); dtFlatFileExtract.Columns.Add("Generate"); string feedName = GetCellValue(ws, 1,r); //string generate = "Y"; string generate = GetCellValue(ws, 9,r); while (!string.IsNullOrWhiteSpace(feedName)) { if (generate.ToUpper() == "Y") { dtFlatFileExtract.Rows.Add( feedName ,GetCellValue(ws, 2,r) ,GetCellValue(ws, 3,r) ,GetCellValue(ws, 4,r) ,GetCellValue(ws, 5,r) ,GetCellValue(ws, 6,r) ,GetCellValue(ws, 7,r) ,GetCellValue(ws, 8,r) ,generate ); } r++; feedName = GetCellValue(ws, 1,r); generate = GetCellValue(ws, 9,r); } if (dtFlatFileExtract.Rows.Count == 0) { throw new Exception("No Delimited Exports were found!"); } #> <Connections> <#foreach (DataRow dr in dtFlatFileExtract.Rows) { string sourceQuery = dr["SourceQuery"].ToString(); feedName = dr["FeedName"].ToString(); #> <FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.<#=dr["FileExtension"]#>" FileFormat="fff_<#=feedName#>"/> <# } #> </Connections> <Packages> <#foreach (DataRow dr in dtFlatFileExtract.Rows) { string sourceQuery = dr["SourceQuery"].ToString(); feedName = dr["FeedName"].ToString(); #> <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey" PackageSubpath="Export Flat Files"> <Tasks> <Dataflow Name="DFT Load file" > <Transformations> <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind"> <DirectInput> <#=sourceQuery #> </DirectInput> </OleDbSource> <FlatFileDestination Name="FF_DEST <#=feedName#>" ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/> </Transformations> </Dataflow> </Tasks> </Package> <# } #> </Packages> <FileFormats> <#foreach (DataRow dr in dtFlatFileExtract.Rows) { string sourceQuery = dr["SourceQuery"].ToString(); feedName = dr["FeedName"].ToString(); try /*Validate Source Query*/ { feedMd = ExternalDataAccess.GetQuerySchema(srcConn, sourceQuery,null,0 ); } catch (Exception ex1) { string errorMessage = "Feed Name: " + dr["FeedName"].ToString() + " CANNOT be evaluated. Error Message: " + ex1.ToString().Split(new [] { '\r', '\n' }).FirstOrDefault(); throw new Exception(errorMessage); } #> <FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited" ColumnNamesInFirstDataRow="<#=((dr["FirstRowColumnNames"].ToString() == "Y" ) ? "true" : "false") #>" TextQualifier="<#=dr["TextQualifier"].ToString().XmlEscape() #>" HeaderRowDelimiter="<#=dr["HeaderRowDelimiter"].ToString().XmlEscape() #>"> <Columns> <# foreach (var column in feedMd.Columns) {#> <Column Name="<#=column.Name #>" DataType="<#=column.DataType #>" Length="<#=column.Length #>" Delimiter="<#=(column == feedMd.Columns.Last()) ? dr["ColumnRowDelimiter"].ToString().XmlEscape() : dr["ColumnDelimiter"].ToString().XmlEscape()#>"/> <# } #> </Columns> </FlatFileFormat> <# } #> </FileFormats> <# } catch (Exception ex) { throw new Exception(ex.ToString()); } } #> <# xl.Quit(); #> </Biml> <#+ public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row) { var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ; return value == null ? string.Empty : value.ToString().Trim() ; } #>
Once again, we’ll start from the top with our namespaces and assemblies which are required for file system access and Excel functionality.
<#@ import namespace="System.IO" #> <#@ import namespace="System.Data"#> <#@ assembly name="Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c, processorArchitecture=MSIL"#> <#@ import namespace="Microsoft.Office.Interop.Excel" #>
Lines 6 through 9 specify the workbook location and worksheet name we’ll be importing from.
string userFolder = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile); userFolder = userFolder + @"\Dropbox\Blog\Biml\Import Metadata from Excel\"; string mdFileName = "Delimitted Feed Metadata.xlsx"; string mdWorkSheetName = "Delimited Export List Advanced";
In the next section, we’ll declare a row number variable which keeps track of which row we are importing from the worksheet. Since this framework takes the provided queries and extracts the metadata from them, I need to define the source connection. I’ve also declared a Biml table variable, that we’ll use to store the FlatFileFormat column metadata.
int r; // Row Number var srcConn = (AstDbConnectionNode)RootNode.Connections["connNorthwind" ]; Varigence.Languages.Biml.Table.AstTableNode feedMd;
The next section looks for the file(s) on disk, and then loops through any it finds, in this case just one.
<# var xl = new Application(); string[] filePaths = Directory.GetFiles(userFolder, mdFileName, SearchOption.AllDirectories); if (filePaths.Count() == 0) { throw new Exception("No Import document found for: " + mdFileName ) ; } foreach (string fil in filePaths) { Workbook wb = null;
We then try to open the workbook and worksheet, and assign the first import row as 2.
try { wb = xl.Workbooks.Open(fil, 0, true, 5, "", "", false, XlPlatform.xlWindows, "\t", false, false, 0, false, true, false); Worksheet ws = (Worksheet)xl.Sheets[mdWorkSheetName]; r = 2; // First Row for Import
Next I create a DataTable called dtFlatFileExtract to hold the metadata, note that the column names correspond to the worksheet.
System.Data.DataTable dtFlatFileExtract = new System.Data.DataTable(); dtFlatFileExtract.Columns.Add("FeedName"); dtFlatFileExtract.Columns.Add("SourceQuery"); dtFlatFileExtract.Columns.Add("FileExtension"); dtFlatFileExtract.Columns.Add("FirstRowColumnNames"); dtFlatFileExtract.Columns.Add("HeaderRowDelimiter"); dtFlatFileExtract.Columns.Add("ColumnRowDelimiter"); dtFlatFileExtract.Columns.Add("ColumnDelimiter"); dtFlatFileExtract.Columns.Add("TextQualifier"); dtFlatFileExtract.Columns.Add("Generate");
Next we’ll store the Feed Name and Generate values from the 2nd row to variables. The while loop which follows adds the values for the specified row (r) of the spreadsheet to the DataTable defined above. After each loop, the row number is incremented. This causes the process to move down the spreadsheet until the first blank row is found, which stops the import process.
string feedName = GetCellValue(ws, 1,r); //string generate = "Y"; string generate = GetCellValue(ws, 9,r); while (!string.IsNullOrWhiteSpace(feedName)) { if (generate.ToUpper() == "Y") { dtFlatFileExtract.Rows.Add( feedName ,GetCellValue(ws, 2,r) ,GetCellValue(ws, 3,r) ,GetCellValue(ws, 4,r) ,GetCellValue(ws, 5,r) ,GetCellValue(ws, 6,r) ,GetCellValue(ws, 7,r) ,GetCellValue(ws, 8,r) ,generate ); } r++; feedName = GetCellValue(ws, 1,r); generate = GetCellValue(ws, 9,r); }
The next section throws an error if feed metadata is not found.
if (dtFlatFileExtract.Rows.Count == 0) { throw new Exception("No Delimited Exports were found!"); }
Next we simply loop through Connections and then Packages, which require only the Feed Name and Source Query variables.
<Connections> <#foreach (DataRow dr in dtFlatFileExtract.Rows) { string sourceQuery = dr["SourceQuery"].ToString(); feedName = dr["FeedName"].ToString(); #> <FlatFileConnection Name="conn_ff_<#=feedName#>" FilePath="C:\FileExport\<#=feedName#>.<#=dr["FileExtension"]#>" FileFormat="fff_<#=feedName#>"/> <# } #> </Connections> <Packages> <#foreach (DataRow dr in dtFlatFileExtract.Rows) { string sourceQuery = dr["SourceQuery"].ToString(); feedName = dr["FeedName"].ToString(); #> <Package Name="Export <#=feedName#>" ConstraintMode="Linear" ProtectionLevel="EncryptSensitiveWithUserKey" PackageSubpath="Export Flat Files"> <Tasks> <Dataflow Name="DFT Load file" > <Transformations> <OleDbSource Name="OLE_SRC <#=feedName#>" ConnectionName="connNorthwind"> <DirectInput> <#=sourceQuery #> </DirectInput> </OleDbSource> <FlatFileDestination Name="FF_DEST <#=feedName#>" ConnectionName="conn_ff_<#=feedName#>" Overwrite="true" ValidateExternalMetadata="false"/> </Transformations> </Dataflow> </Tasks> </Package> <# } #> </Packages>
As I mentioned in previous articles, the challenging part of this framework is creating the FlatFileFormat, particularly the columns. To begin, we loop through each DataRow and import the metadata with a Try/Catch. The Catch warns you when your query doesn’t work.
<FileFormats> <#foreach (DataRow dr in dtFlatFileExtract.Rows) { string sourceQuery = dr["SourceQuery"].ToString(); feedName = dr["FeedName"].ToString(); try /*Validate Source Query*/ { feedMd = ExternalDataAccess.GetQuerySchema(srcConn, sourceQuery,null,0 ); } catch (Exception ex1) { string errorMessage = "Feed Name: " + dr["FeedName"].ToString() + " CANNOT be evaluated. Error Message: " + ex1.ToString().Split(new [] { '\r', '\n' }).FirstOrDefault(); throw new Exception(errorMessage); } #>
Next we populate the FlatFileFormat and it’s columns.
<FlatFileFormat Name="fff_<#=feedName#>" FlatFileType="Delimited" ColumnNamesInFirstDataRow="<#=((dr["FirstRowColumnNames"].ToString() == "Y" ) ? "true" : "false") #>" TextQualifier="<#=dr["TextQualifier"].ToString().XmlEscape() #>" HeaderRowDelimiter="<#=dr["HeaderRowDelimiter"].ToString().XmlEscape() #>"> <Columns> <# foreach (var column in feedMd.Columns) {#> <Column Name="<#=column.Name #>" DataType="<#=column.DataType #>" Length="<#=column.Length #>" Delimiter="<#=(column == feedMd.Columns.Last()) ? dr["ColumnRowDelimiter"].ToString().XmlEscape() : dr["ColumnDelimiter"].ToString().XmlEscape()#>"/> <# } #> </Columns> </FlatFileFormat> <# } #> </FileFormats>
Finally we have the Catch statement which corresponds to the original Try (opening the workbook). This is followed by the closing Biml tag and the GetCellValue method.
<# } catch (Exception ex) { throw new Exception(ex.ToString()); } } #> <# xl.Quit(); #> </Biml> <#+ public string GetCellValue(Microsoft.Office.Interop.Excel.Worksheet ws, int column, int row) { var value = ((Microsoft.Office.Interop.Excel.Range)ws.Cells[row,column]).Value ; return value == null ? string.Empty : value.ToString().Trim() ; } #>
Next Steps
In this series we’ve taken several optional shortcuts, each offered a way to learn and use Biml while still getting real work done. Now, we have a fully functioning framework which dynamically creates SSIS export packages based on metadata contained in a spreadsheet. Hopefully this series has inspired you to learn Biml, and create your own frameworks. Be sure to follow this blog because I’ll share more patterns in the near future.
The files used for this post can be downloaded here.
excellent series! I learned a lot. The excel spreadsheet to generate Biml is especially clever!
[…] David Stein wraps up a series on using Biml to load flat files: […]