Archives

Date vs Integer Datatypes as Primary Key for Date Dimensions

Every Kimball Group book I’ve read, as well as every Data Warehouse class I’ve attended, has indicated that a Date Dimension Primary Key should be a smart integer key in the format YYYYMMDD (20110518) so I’ve always built my Date tables that way. However, Barnaby (Blog/Twitter) pointed out that I should be using a Date Datatype key in SQL 2008 instead. His reasoning is sound in that the new Date datatype requires 3 bytes while an Int requires 4. This saves space and memory during processing and still facilitates table partitioning by year. The most obvious downside is that an unknown member value is required for Fact Table processing. However, I always use 19000101 as my unknown member, which can be entered as a date value anyway.

When I inquired on the internets about this prospect, I found conflicting opinions. One cited downside of using Date datatype was that the SQL Query Optimizer was more efficient when joining on Integers rather than dates or character types. However, I couldn’t find any test results involving the Date datatype to back that up so I thought I’d test it myself.

Let me preface the rest of this article with a disclaimer. I am not a query tuning expert, nor do I play one on TV. However, I ran the basics of these tests against the great Grant Fritchey (Blog/Twitter) and am grateful to him for the guidance.

Setup

I used my Date Table Script to generate two date tables, one with a smart integer key called DateTableInteger and another called DateTableDate with a Date datatype primary key. The primary key field for each is called DateID. Each table included an index on the [Date] column, which is a DateTime type in DateTableInteger and a Date datatype in DateTableDate. Each table is identical in every other way.

I created two source tables utilizing code written by Michael Valentine Jones from SQLTeam using his method of generating random numbers which can be used to create a column of random dates. For this test I created two tables with a single field of two million records each with dates ranging from 1/1/2000 to 12/31/2020. The table with the Date datatype is called DateSource and the other with smart integer keys is called IntSource. Each source table has exactly the same list of dates as well for test conformity.

Tested Queries and Methodology

I used three simple queries for my tests. Each represents a common scenario where the user needs to select records from a specific year.

set statistics io on
set statistics time on
 
Select DT.Date From IntSource ISO -- Integer Type Source
join DateTableInteger DT on DT.DateId = ISO.IntColumn
Where DT.Date >= '1/1/2010' and DT.Date < '1/1/2011'
 
Select DT.Date From dbo.DateSource DS -- Date Type Source
join DateTableDate DT on DT.DateId = DS.DateColumn
Where DT.Date >= '1/1/2010' and DT.Date < '1/1/2011'
 
Select DS.DateColumn From dbo.DateSource DS -- Date Type bypassing join. 
Where DS.DateColumn >= '1/1/2010' and DS.DateColumn < '1/1/2011'

The first query is an integer join, the second utilizes a Date datatype join, and the third is Date datatype but doesn’t require a join. I realize that the two field values in the second query, DateID and Date, are identical but I used it for consistency and it represents a scenario where the join is used to pull additional information such as Calendar Month. In my experience many queries against a Data Warehouse don’t require additional information about a date anyway, and I suspected that the greatest performance benefit would be found when the extraneous join was removed. In my testing, the Integer Join represents my control, the base to which I compare other values. Each query returns the same 100,086 records.

I performed two rounds of tests. The first used non-indexed source values and in the second I added a clustered index to each of the source tables, as Fact Tables often use a date field for their clustered indexes.

Prior to testing each query set, I called dbcc freeproccache and dbcc dropcleanbuffers. I then ran the query set once to eliminate compile time which I was not interested in. Each query was executed 10 times and I averaged the CPU and elapsed time to eliminate other factors. Note that this is a production level server, Dell R510 with Raid 10 arrays of 15K rpm drives, that has not yet been put in production so nothing should be running while I tested. The server has a setting of MAXDOP 1.

Results

All recorded times are in milliseconds (ms).


Conclusions

Obviously using a Date datatype instead of the smart integer will save space on disk and in memory, however the savings are not dramatic. For example, the space difference is less than 2MB with two million records.

Also, in my opinion, the performance differences between the Integer and Date datatype joins isn’t statistically relevant. I consider performance deltas of less than 5% to be a wash because my testing methodology isn’t exhaustive. The clustered index date join outperformed the integer join by 24% but trailed it by 5% when not indexed. However, I find it interesting that in both sets of tests the number of logical reads is less for the Date datatype join than the integer join.

As I expected, the Date datatype key method easily outperforms the other two when a join is not necessary. The 52% performance boost without indexes and 48% for clustered indexes is considerable and were reflected in a second test run I performed to verify.

Based on these results, I am going to switch to using the Date datatype in my data warehouse projects using SQL Server 2008. Next week, I’ll post an updated script to reflect this.

Come See Me Present Wednesday at the Fort Worth SSUG.

This Wednesday I’ll be presenting at the Fort Worth SQL Server User Group. Come see me present:

Data Warehousing – How to Convince “The Bobs”

I got a meeting with the Bobs in a couple of minutes...

Building your first Data Warehouse is a long, and often difficult process. How can you get your boss to approve a Data Warehouse project? What’s the best way to explain dimensional modeling and the benefits of a Data Warehouse to a business person? What are the best/most cost effective ways of learning it? What kind of materials, hardware, software, etc do you need? What’s the best way to build a proof of concept that will impress your boss, as rapidly as possible?

 

Come to this presentation, and I’ll answer all of these questions plus the most important question of all.

“Is this good for the company?” Absolutely.

This presentation went over very well at SQL Saturday Chicago, so come laugh and learn with me.

Using a Date Table to Track Custom Fiscal Years/Periods

Calendar Tables are only useful if they reflect how your company evaluates its data. Therefore, any company using non-standard financial fiscal years and periods will require customization with their Calendar or Date Table. In this post, I’ll provide an example of how to do so using the M2M ERP System Database. For those who don’t use M2M, the notes should serve as an example of how to do so with other systems.

The most effective way to determine the fiscal setup of a company is to determine how its major systems track the data. In M2M the source is the GL Rules and Periods (GLRule) table. There are four primary fields of interest, which I’ve aliased, and they are listed in the following query and screenshot.

SELECT GLR.fcname AcctYearName
      ,GLR.fnnumber PeriodNo
      ,GLR.fdstart FirstDayOfPeriod
      ,GLR.fdend LastDayOfPeriod
  FROM dbo.glrule GLR

As you can see, there is one record for each fiscal period in GLRule. FCName is a Char(20) field and typically contains values like “FY 2005” or “Fiscal Year 2005”. FNNumber contains the period number and is an Int datatype. FDStart and FDEnd are Datetime fields and represent the first and last days of each period.

What follows are the update statements which I use to customize the fiscal periods of my calendar table. For your convenience, you can access the entire script here****. Copy and paste the entire query, replace the database and table name, and run it. I’ve dissected each step of the script below so those not using M2M can understand my logic and mimic those steps with their own systems.

Six fields are updated by the first statement. Notice that I’ve joined the Calendar Table to GLRule with a between statement. I’m using the pattern matching of Patindex to find the four digit number for Fiscal Year. However, if the user entered something like “Two Thousand Five” as a Fiscal Year, this method will not work. Start and end of period fields are taken directly out of GLRule as well. FiscalDayOfPeriod is calculated in this step as well for good measure.

Use M2MDataXX
GO
UPDATE TST
   SET TST.[FiscalYearName] =
        substring (GLR.fcName, patindex ('%[1,2][9,0,1,2][0-9][0-9]%', GLR.fcName), 4)
      ,TST.[FiscalYearPeriod] =
        (substring (GLR.fcName, patindex ('%[1,2][9,0,1,2][0-9][0-9]%', GLR.fcName), 4))
        + + (GLR.fnnumber / 100.00)
      ,TST.[FiscalPeriod] = GLR.fnnumber
      ,TST.[FiscalStartOfPeriodDate] = GLR.fdstart
      ,TST.[FiscalEndOfPeriodDate] = GLR.fdend
      ,TST.[FiscalDayOfPeriod] = (DATEDIFF (dd, GLR.fdstart, TST.[DATE]) + 1)
  FROM  DateTestStnd.dbo.DimDate TST
       INNER JOIN
        glrule GLR
       ON TST.DATE BETWEEN GLR.fdstart AND GLR.fdend

In this step, I’ve calculated the beginning and end of Fiscal Years. I group the record set in GLRule by Fiscal Year and then use min/max to determine the first and last day in each group.

UPDATE TST
   SET TST.[FiscalStartOfYearDate] = M2MYEARS.[FiscalStartOfYearDate]
      ,TST.[FiscalEndOfYearDate] = M2MYEARS.[FiscalEndOfYearDate]
  FROM  DateDatabase.dbo.DimDate TST
       INNER JOIN
        (SELECT substring (GLR.fcName, patindex ('%[1,2][9,0,1,2][0-9][0-9]%', GLR.fcName), 4)
                 FiscalYearName
               ,min (GLR.fdstart) [FiscalStartOfYearDate]
               ,max (GLR.fdend) [FiscalEndOfYearDate]
           FROM glrule GLR
         GROUP BY GLR.fcname) M2MYEARS
       ON TST.FiscalYearName = M2MYEARS.FiscalYearName

In the next step, I calculate the FiscalDayofYear and FiscalYearDayOfYear field. While I could have included this code in the previous step, it’s much easier to read and understand as a separate step. The FiscalDayOfYear calculation is simple and is simply the number of days between the current record and the first day Fiscal Year calculated previously. The FiscalYearDayOfYear is the same calculation, but I convert the integer (day) to a decimal and add it to the Fiscal Year integer. This makes drill down and grouping easier.

UPDATE TST
   SET TST.[FiscalDayOfYear] = (DATEDIFF (dd, [FiscalStartOfYearDate], TST.[DATE]) + 1)
      ,TST.[FiscalYearDayOfYear] =
        (datepart (year, TST.[DATE])
         + cast (
            (datediff (dd, dateadd (yy, datediff (yy, 0, TST.[DATE]), 0), TST.[DATE]) + 1)
            / 1000.00 AS NUMERIC (3, 3)))
   FROM  DateDatabase.dbo.DimDate TST

The next section looks complex because I have to determine whether the number of Fiscal Periods can be broken into Quarters and Semesters. If a company used 10 Fiscal Periods per year, the concept of Quarters doesn’t make sense, but Semesters do. To make the code easier to understand, I’ve numbered the sections.

Section 1 checks whether the number of periods per fiscal year can be divided into quarters and semesters and the number of periods in each. Section 2 includes the period start and end dates so that the values of quarter and semester can be related back to the date table. Values for quarter or semester will be set to 0 if they aren’t valid in a particular M2M install.

UPDATE TST
   SET [FiscalQuarter] =
        CASE M2MYEARS.Quarterific
         WHEN 'Yes' THEN (ceiling (CAST (M2MYEARS.PeriodNo AS DECIMAL (4, 2)) / M2MYEARS.QuarterSize))
         ELSE 0
        END
      ,[FiscalSemester] =
        CASE M2MYEARS.Semesterific
         WHEN 'Yes' THEN (ceiling (CAST (M2MYEARS.PeriodNo AS DECIMAL (4, 2)) / M2MYEARS.SemesterSize))
         ELSE 0
        END 
  FROM  DateTestStnd.dbo.DimDate TST
       INNER JOIN
 -------------- 2. Include Start and End of Period Values. 
        (SELECT fcname FYName
               ,fdstart PeriodStart
               ,fdend PeriodEnd
               ,fnnumber PeriodNo
               ,GLRAGG.QuarterSize
               ,GLRAGG.Quarterific
               ,GLRAGG.SemesterSize
               ,GLRAGG.Semesterific
           FROM  dbo.glrule GLR
                INNER JOIN
  -------------- 1. Calculate whether Quarters and Semesters are Valid (and their values)
                 (SELECT fcname AGGFYName
                        ,MAX (fnnumber) PeriodCount
                        , (Max (fnnumber) / 4) QuarterSize
                        ,CASE WHEN Max (fnnumber) % 4 = 0 THEN 'Yes' ELSE 'No' END AS Quarterific
                        , (Max (fnnumber) / 2) SemesterSize
                        ,CASE WHEN Max (fnnumber) % 2 = 0 THEN 'Yes' ELSE 'No' END AS Semesterific
                    FROM dbo.glrule
                  GROUP BY fcname) GLRAGG
                ON GLR.FCNAME = GLRAGG.AGGFYNAME) M2MYEARS
 -------------- End of 1
       ON TST.DATE >= M2MYEARS.PeriodStart AND TST.DATE <= M2MYEARS.PeriodEnd
  -------------- End of 2

Workday calculation is also somewhat difficult to understand. In M2M this information is kept in the Production Calendar (SCCALN) table. The fields we care about are:

  • fcYrMon – Year and month in the format of YYYY/MM (2011/02). Unfortunately these are not per Fiscal Year or Period, they are by Calendar Year and Month.
  • fcShifts – A Varchar (31) field with the number of Shifts per day.


Needless to say, we need an easier way than parsing through this table every time we need to calculate work days. The join is a little strange because the Date Table CalendarYear and CalendarMonth fields are SmallInt and need to be cast as Varchar. I’ve used a left join because many M2M companies don’t use the Production Calendar so SCCALN may lack records for fiscal periods.

Basically the code checks for a corresponding record in SCCALN. If one does not exist, then the typical Monday through Friday work week is used by default. If it does exist the fcShifts field is parsed, and if there are one or more shifts for a particular day, then the date is designated a work day.

UPDATE TST
   SET WorkDay =
        CASE
         WHEN wc.fcShifts IS NULL THEN
           CASE WHEN TST.CalendarDayOfWeek IN (1, 7) THEN 'No Work' ELSE 'Work Day' END
         ELSE
           CASE
            WHEN substring (WC.fcShifts, TST.CalendarDayOfMonth, 1) = '0' THEN 'No Work'
            ELSE 'Work Day'
           END
        END
      ,IsWorkDay =
        CASE
         WHEN WC.fcShifts IS NULL THEN 
           CASE WHEN TST.CalendarDayOfWeek IN (1, 7) THEN 0 ELSE 1 END
         ELSE 
           CASE WHEN substring (WC.fcShifts, TST.CalendarDayOfMonth, 1) = '0' THEN 0 ELSE 1 
           END
        END
  FROM  DateDatabase.dbo.DimDate TST
       LEFT JOIN
        dbo.ScCaln WC
       ON WC.fcYrMon = cast (TST.CalendarYear AS Varchar (4))
           + '/'
           + right ('0' + cast (TST.CalendarMonth AS Varchar (2)), 2)

The final bit of code updates the FiscalWeekName and FiscalYearWeekName. This code is tailored to my current employer, and I consider it optional because it may not pertain to you. The logic is as follows:

  1. The first fiscal week starts on the first day of the fiscal period.
  2. Fiscal weeks end on Friday and begin on Saturday.
  3. A “partial week” where the week starts on a day other than Saturday is added to the next full week. Partial weeks ending a period are considered their own week.
  4. Week names follow the pattern of [FiscalPeriod].[WeekNumber] both of which are integers. For example, the second week of the fifth period would be designated as “5.2”.

Again, this is a specific requirement and it may not apply to most companies.

UPDATE TST
   SET FiscalWeekName =
        TST.FiscalPeriod
        + CASE ( (SELECT count (*)
                    FROM DateDatabase.dbo.DimDate TMP
                   WHERE TMP.CalendarDayOfWeek = 7
                         AND TMP.Date BETWEEN TST.FiscalStartOfPeriodDate AND TST.Date)
                / 10.0)
           WHEN 0 THEN
            .1
           ELSE
            ( (SELECT count (*)
                 FROM DateDatabase.dbo.DimDate TMP
                WHERE TMP.CalendarDayOfWeek = 7
                      AND TMP.Date BETWEEN TST.FiscalStartOfPeriodDate AND TST.Date)
             / 10.0)
          END
  FROM DateDatabase.dbo.DimDate TST
-- Updates FiscalYearWeekName from FiscalWeekName
UPDATE TST
   SET TST.FiscalYearWeekName =
          cast (TST.FiscalYearName AS CHAR (4))
        + '.'
        + (CASE len (TST.FiscalWeekName) WHEN 3 THEN '0' ELSE '' END)
        + cast (TST.FiscalWeekName AS CHAR (4))
  FROM DateDatabase.dbo.DimDate TST

In most systems, new fiscal years and periods are added once a year. The update script will need to be run after that happens. With Made2Manage, and presumably other systems, triggers could be used to detect those changes and then run the update script. In my own situation, I’ve chosen to save this update script as a scheduled job which runs weekly. This may seem wasteful, but the update script completes very quickly.

Technically, I could have integrated this script with the one I posted previously which would have been faster and more efficient. However, I use two scripts for several reasons. First, this method is more flexible as I can create any Fiscal Update script I want without editing the main calendar script. Second, the Fiscal Update script will need to be run periodically to get updates and dropping/recreating the entire date table would be wasteful. Finally, most systems do not track Fiscal Years and Periods a decade or more into the future. Therefore, the first script is required to set default values for the Fiscal fields and only those that exist in the source system are updated. The user will not receive an error on a report because a future fiscal date value was not supplied.

Well, there you have it. The last two articles have described how I create date tables and customize the Fiscal fields from a source system, in this case Made2Manage.

Any questions or suggestions to make the process better?

Calendar Tables - Why You Need One.

*** Note: The code below is for use with SQL 2005 and previous versions. I have written new code to take advantage of the Date Datatype in SQL 2008 and above which can be found here.

What is a Date (Calendar) Table?

For the purposes of this article a Date Table is a dedicated table containing a single record for each day in a defined range. They include fields with descriptive attributes for each day such as Year, Month, Week, and whether a particular day is a work day or not. Date Tables (Dimensions) are an integral part of every Data Warehouse, but they can be used with a regular (OLTP) database as well.

Why do I need a Date Table?

There are several situations in which a date table is extremely helpful. Let’s consider just a few. Suppose you need to….

  • Generate a running sales total per day and include days in which you have no sales such as weekends.
  • Calculate the number of workdays between two dates.
  • Calculate projected ship dates for products based upon lead times.
  • Aggregate data by non-standard Fiscal Years or Periods. Perhaps your company uses the Fiscal Year of the US Government which runs from October 1st until September 30th. My current employer uses the 4-4-5 Calendar methodology, which is what originally led me to create this Date Table script.
  • Need to track data by seasons, whatever that means to your company. For example, if you manufactured bicycles you’d certainly want to compare sales figures in the spring versus fall.
  • Need to compare data using rolling months/periods. Suppose your company wants to know how March sales for this year compared to March for the past 5 years. Or you want to compare sales which occurred on Fridays to each other to look for trends.
  • Need to create Gantt Charts.

While many of these issues can be handled by writing T-SQL code, they’re much easier with a dedicated date table. Also, the Date Table can be customized to track anything about a particular day you want.

What about Space Considerations?

One concern that people have when creating a Date Table is the “wasted” space since each day is its own record. The space used is negligible, especially when compared to the benefits of having a dedicated date table. For example, when I execute my script for the years 2000 through 20035 the space used is less than 250MB.

How do I create a Date Table?

Kimball Date Dimensions
In the The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2and the previous 2005 version, the Kimball Group recommends that you create a Date Dimension using Excel. In the latest book, they provide a Date Dimension Spreadsheet which is available for Download and utilizes VBA to generate the table. They recommend you create a SSIS package to load that data into SQL Server.

While this will work, and some non-technical people might prefer this method, I prefer to use SQL Script, because I may not always have access to SSIS.

Using a SQL Script
When I searched for a Date Table Script a few years ago, I couldn’t find something suitable for my needs. Most of them include only one set of fields (regular Calendar) and I to report via custom fiscal calendars.

I created this script from a function called F_TABLE_DATE by Michael Valentine Jones from SQLTeam and have since gotten his permission to distribute it. I’ve modified the script so that it generates a Kimball style date dimension and it’s very comprehensive, yet simple to use and customize.

  1. Create a database called DateDatabase or whatever you’d like to call it. We’ll install F_TABLE_DATE in it and presumably the Date Table as well.
  2. Open the following script and replace “DateDatabase” (Find and Replace) with whatever database name you chose in step 1.
  3. Run the Script to create the function.

Your database should now contain the function used to create Date Tables. As a test, run the following script:

SELECT * FROM dbo.F_TABLE_DATE ('20000101', '20101231')
ORDER BY 1

You should receive 4019 Rows, 11 years worth of days plus a record for unknown dates. Notice the extensive list of fields built into the function. All of them are documented in the script.

Use a Select Into statement to create the Date Table like so. Replace dbo.DimDate with whatever you’d like to call your Date Table.

IF EXISTS
     (SELECT 1
        FROM INFORMATION_SCHEMA.TABLES
       WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_NAME = 'DimDate')
  BEGIN
        DROP TABLE dbo.DimDate;
  END
 
Select
  DateId
 ,Date
 ,NextDayDate
 ,CalendarYear
 ,CalendarYearQuarter
 ,CalendarYearMonth
 ,CalendarYearDayOfYear
 ,CalendarQuarter
 ,CalendarMonth
 ,CalendarDayOfYear
 ,CalendarDayOfMonth
 ,CalendarDayOfWeek
 ,CalendarYearName
 ,CalendarYearQuarterName
 ,CalendarYearMonthName
 ,CalendarYearMonthNameLong
 ,CalendarQuarterName
 ,CalendarMonthName
 ,CalendarMonthNameLong
 ,WeekdayName
 ,WeekdayNameLong
 ,CalendarStartOfYearDate
 ,CalendarEndOfYearDate
 ,CalendarStartOfQuarterDate
 ,CalendarEndOfQuarterDate
 ,CalendarStartOfMonthDate
 ,CalendarEndOfMonthDate
 ,QuarterSeqNo
 ,MonthSeqNo
 ,FiscalYearName
 ,FiscalYearPeriod
 ,FiscalYearDayOfYear
 ,FiscalYearWeekName
 ,FiscalSemester
 ,FiscalQuarter
 ,FiscalPeriod
 ,FiscalDayOfYear
 ,FiscalDayOfPeriod
 ,FiscalWeekName
 ,FiscalStartOfYearDate
 ,FiscalEndOfYearDate
 ,FiscalStartOfPeriodDate
 ,FiscalEndOfPeriodDate
 ,ISODate
 ,ISOYearWeekNo
 ,ISOWeekNo
 ,ISODayOfWeek
 ,ISOYearWeekName
 ,ISOYearWeekDayOfWeekName
 ,DateFormatYYYYMMDD
 ,DateFormatYYYYMD
 ,DateFormatMMDDYEAR
 ,DateFormatMDYEAR
 ,DateFormatMMMDYYYY
 ,DateFormatMMMMMMMMMDYYYY
 ,DateFormatMMDDYY
 ,DateFormatMDYY
 ,WorkDay
 ,IsWorkDay
into
  dbo.DimDate -- Choose whatever name you like for your table.
From
  dbo.F_TABLE_DATE('20000101','20251231') -- Edit the parameters for a custom date range.
  Order By 1

Don’t panic, you don’t need to use all of the fields which is why this script is so highly customizable. Any fields you don’t want can be commented out before running the Select Into statement, which allows you to create your own custom Date Table.

The final step is to add a Primary Key to the DateID and an index for the Date and NextDayDate fields. As I can’t predict the usage patterns for your date table, you’ll have to implement further indexing if necessary.

ALTER TABLE [dbo].[DimDate]
ADD  PRIMARY KEY CLUSTERED ([DateId])
WITH FILLFACTOR = 100
 ON [PRIMARY]
GO
 
CREATE INDEX id_Dates
ON DateDatabase.dbo.DimDate 
	(Date, 
	 NextDayDate)

How do I use the Date Table?

Using the date table is simple, you simply link to it from the query from your source system. Use the following template.

SELECT
  MY.date
 ,DTE.CalendarYear
 ,DTE.CalendarMonthName
FROM
    MyTable MY
  INNER JOIN
    DateDatabase.dbo.DimDate DTE
  ON MY.date >= DTE.date AND
     MY.date < DTE.NextDayDate

Both DTE.Date and DTE.NextDayDate represent midnight of the two consecutive days. Therefore the time portion of a Datetime field can be ignored.

What about the fiscal fields?

After running these scripts, and examining the Date Table, you’ll notice that the fiscal fields mostly mirror the calendar fields. This is by design. In the next article I’ll explain how to customize the fiscal fields and a script to extract the fiscal information out of Made2Manage. Even if you don’t use M2M, you can still use the fiscal script as an example of how to do it.

Any questions?

SQL Saturday Dallas Reflections

SQL Saturday Dallas, which was our third, went off without a hitch. While there were a few items we’ve identified as needing work, overall the event was swimmingly successful.

Kevin Kline - Troubleshooting and Performance Tuning Microsoft SQLServer with Dynamic Management Views (DMVs)

The festivities started on Friday with our three Pre-Con events. Kevin Kline (Blog/Twitter), Jessica Moss (Blog/Twitter), and Randy Dyess from Solid Quality Mentors.

I attended parts of each and impressed by all of them. For the price you couldn’t beat the educational value of the North Texas SQL Server User Group Pre-Cons.

After the Pre-Cons the core volunteers went to the Region 10 Educational Center (great facility by the way) to set up for the event.

The blind leading the blind.

A special thanks to Heather Sullivan (Twitter), Head of SQL Server Tools for Idera Software, for bringing so many people to volunteer. I think the total number was 8 and those folks worked their butts off.

Natural Born Leader

Have I mentioned lately how much I love SQL Saturday Events? I’d love to be able to present more often. Attention SQL Vendors and Sponsors, that was a hint. 🙂

After that we went to the speaker’s dinner, which was held at Dave and Buster’s and a great time was had by all. Where else can you spend hours talking to Steve Jones (Blog/Twitter), Jessica Moss (Blog/Twitter), Andy Warren (Blog/Twitter), Wes Brown (Blog/Twitter), etc.

Scott Whigham and I playing pool. If you squint hard, I almost look like Tom Cruise in The Color of Money.

SQL Saturday always starts early when you are a volunteer. One of my responsibilities was registration, and everything went well. The success is mostly due to having so many committed volunteers, and they made me look good once again.

Throughout the day, things continued to go well. Speaker shirts were well received, the food and the Paciugo Gelato (fancy ice cream) were a big hit.

I want to give special thanks to a few people and groups. I apologize for leaving anyone out.

  • I really appreciate the sponsors who make these events possible.
  • Tim Mitchell (Blog/Twitter) who was not only a core volunteer and a presenter, but took the pictures as well.
  • All of the volunteers, especially our core team.
  • All of the presenters, many of whom traveled to speak at the event at their own cost.

Because I was so busy, I only attended a couple of presentations. Sean McCown (Blog) presented “How to Curse in an Interview” and it was one of the most intense talks I’ve ever seen in a technical setting. I also watched Sri Sridhara (Blog/Twitter), NTSSUG President, present “Turbo charge your DBA career” and he had the crowd mesmerized.

I presented “T-SQL Programming to Survive a Zombie Database” and it went very well. I love teaching, and I think it shows. In fact, Jen McCown (Blog/Twitter), who’s opinion I greatly respect, attended and tweeted the following:

Awesomesauce!


If you missed the session, I will be presenting an extended version of it at the NTSSUG Meeting next Thursday 10/21 and the downloads for the session have been posted here.

Of course I went to the party after the event and spent quality time with community members. Everyone should attend the after parties. There’s a wealth of help available there for free, not to mention those folks are fun to hang out with as well. Where else can you ask for free help like this?

  • Adam Saxton (Blog/Twitter), Senior Escalation Engineer for SQL Server Support, about SSRS problems.
  • Steve Jones or Andy Warren about branding.
  • Tim Mitchell or Sean McCown about SSIS.
  • Wes Brown about SQL Server storage.
  • Jessica Moss about Microsoft BI.
  • Tim Costello (Blog/Twitter) about data visualization and Tableau.
  • Trevor Barkhouse (Blog) about performance tuning.
  • Ryan Adams (Blog/Twitter) about mirroring.
  • Kevin Kline about… well everything.

These events are free sources of SQL education, but that’s not the most important aspect. They’re about people, great people who share your profession and interests. How did I meet all of these great people? I joined PASS, participated in my local SQL server user group, and went the extra mile to be social.

You can do it too. Throw caution to the wind and talk to people, we don’t bite. Well, I’m not so sure about Sean.

SQL Saturday Chicago Highlights

It’s been a crazy few weeks. As I mentioned previously, I presented at two SQL Saturday Events in back to back weeks. I’m on the core committee for the North Texas SQL Server User Group SQL Saturday, so I was overwhelmed with things to do and couldn’t stop to blog.

So, now that I’ve taken a few days to catch up and relax I’ll share some of the highlights of both experiences.

SQL Saturday Chicago

I had a great time at this event put on by both the Chicago SQL Server User Group and the Chicago Suburban SQL Server User Group. I flew in early Friday to attend my second Freecon event. Freecon is a free event put on by Brent Ozar (Blog/Twitter) in which he invites a small group of bloggers/presenters from the SQL Community to discuss issues important to us and the community. My Freecon experiences have been excellent, and I’ll go into those in more depth in a future post. I simply can’t thank Master O enough for including me. Oh, and the atmosphere at the The Catalyst Ranch was amazing too.

So, why mention it? Well, I learned an important lesson. I am never, ever flying out at 5:30am again. I was so exhausted after getting up at 3am that I never caught up. SQL Saturday events are packed with activity, especially if you’re volunteering or speaking. After the Freecon, Erin Stellato (Blog/Twitter) was kind enough to cart Karen Lopez aka DataChick (Site/Blog/Twitter), her hubby Rob Drysdale, and I to and from our hotel which was a bit of a distance from SQL Saturday.

SQL Sentry hosted the speaker’s dinner. The food was excellent and there were so many amazing database people to meet including Aaron Bertrand (Twitter) who works for SQL Sentry. SQL Saturday Chicago provided personalized shirts for the speakers with our names on the front. It was a classy touch which I intend to suggest to my local group. The problem with personalized shirts is the risk of typos.

Thomas LaRock (Blog/Twitter) was a good sport about being called RockSar all weekend.

I got up bright blurry eyed Saturday morning, scarfed some breakfast, and headed to the event. It was obvious that this was not the Chicago groups’ first rodeo as they knew what they were doing. Wendy Pastrick (Blog/Twitter) and her group had everything running smoothly. Noel McKinney (Blog/Twitter), whom I had met on Friday, had me through the check in process and I never saw a line up.

Unfortunately I didn’t attend many presentations because I was having laptop problems and had to re-build part of my demo. This was my second important lesson of the trip. I need better contingency plans so I’ve purchased VMWare Workstation and every one of my demos will be done that way. I’ll keep a backup copy of the VM on a USB hard drive so I can use another laptop at a moment’s notice.

I did see Grant Fritchey (Blog/Twitter) present “The Optimizer, Statistics, and Indexes.” Honestly, I think everyone saw him. The room was packed so full that people were standing along the walls and SQL Rockstar sat right next to him. Grant is amazing, if you ever get the chance to see him present, do not pass it up. In the speaker’s room after the presentation, we were laughing because one of the attendees commented that the “Sidekick in the corner was distracting” because Tom and Grant bantered back and forth a bit during his presentation. Hilarious.

Grant took this short video of me in the speaker’s room. It should be worth a laugh.

An old friend came to see me at the event. She used to work for Made2Manage support and she’s one of the many people I’ve encouraged to get out into the much larger world of SQL Server. Anyway as she and I were talking in the hallway, Grant Fritchey walked by and I introduced him as “The Scary DBA.” My friend scoffed and said that when she worked in support my phone calls would sometimes bring her to tears. Grant thought it was hilarious, but I was a wee bit embarrassed.
http://www.flickr.com/photos/m-i-k-e/5563407238/in/set-72157626362555312/
My presentation went well, and I had more than 40 attendees. Not bad considering I was scheduled at the same time as Brent Ozar, Datachick, Jes Borland, and others.

As promised the downloadable scripts and presentation can be downloaded here.

I learned something else from this trip. I normally take a lot of time creating geeky/funny Abstracts and Titles like “Data Warehousing – How to convince ‘The Bobs’.” However, when I asked how many people had seen “Office Space,” which is the underlying theme of the presentation, less than 10% had seen it. Ugh. I’m going to have to re-think my Title and Abstract Strategy and go with something closer to “Creating a Data Warehouse Proof of Concept” which has more widespread appeal.

I went to the after party and had a great time talking with Jose Chinchilla (Blog/Twitter) as the dude is hilarious. I hope to get him back down to Dallas to present again soon. I also succumbed to peer pressure and had a drink with Jose, Datachick + 1, and Noel McKinney and his wife. Apparently I get really funny even after having one drink. Anyway, I ended up at SQL Karoake, but wasn’t up to embarassing myself like I did at PASS last year, so I just watched with Datachick + 1.

I will tell you that SQL People let their hair down like no other at SQL Karoake. Jes Borland (Blog/Twitter), who has more energy than any three people I know, and Jason Strate (Blog/Twitter) belted out a duet and everyone had a good time.

I have no idea why more people don’t attend the after parties. It’s fun, and a great way to network with some of the best in the SQL profession.

What I want you to take from this is simple. Attending SQL Saturday Events is a lot of fun and great for your career. Take full advantage of them, including the after party.

Come Join Us.

Meme Monday - SSIS Frustrations

One of the cool kids, Thomas LaRock (Blog/Twitter), has started a new community blogging game called Meme Monday. My friend John Sterrett (Blog/Twitter) tagged me this morning, and my attempt is below.

SSIS work is not good for your self-esteem on Monday morning.

These things spread like wildfire, so I’m not sure who’s been tagged and who isn’t, but here goes.

I hereby tag Jorge Segarra (Blog/Twitter) and Jen McCown (Blog/Twitter).

Have fun folks.

Come See Me Present at SQL Saturday Chicago and Dallas

I’ll be presenting at two SQL Saturdays back to back. You are taking advantage of SQL Saturday events, right?

SQL Saturday Chicago is this weekend (3/26), and I’ll be presenting:

Data Warehousing – How to Convince “The Bobs”

I got a meeting with the Bobs in a couple of minutes...


Building your first Data Warehouse is a long, and often difficult process. How can you get your boss to approve a Data Warehouse project? What’s the best way to explain dimensional modeling and the benefits of a Data Warehouse to a business person? What are the best/most cost effective ways of learning it? What kind of materials, hardware, software, etc do you need? What’s the best way to build a proof of concept that will impress your boss, as rapidly as possible?

Come to this presentation, and I’ll answer all of these questions plus the most important question of all.

“Is this good for the company?” Absolutely.

The following Saturday (4/2) my home group, the North Texas SQL Server User Group, will be hosting our 3rd SQL Saturday. I’ll be presenting the following:

T-SQL Techniques to Survive Zombie Databases


Do you have a Zombie Database? Does your database suffer from roaming hordes of problems like the lack of primary and foreign keys, poor normalization, in place updates and deletes, and an insatiable drive to eat your braaaaaaaaaaaaaains (CPU cycles).

Fear not fellow survivor for in this session, I’ll show you how to fortify your code against zombie infestation. How to deal with the unknown (nulls), how to Restore Order by Joining together in Groups. Flak Jackets are Optional, but encouraged.

So, participate in SQL Saturdays. Present, volunteer, or just attend. They’re excellent, free sources of SQL Education.

Isn't it Ironic? Don't You Think?

Anyone who reads this blog should know that I’m a fan of Todd McDermid and his free Kimball Slowly Changing Dimension Transform. For those of you who’ve not been following his blog, he was contacted by the Kimball Group and informed that he needed to stop using Kimball’s name for his free transform, the Kimball Method Slowly Changing Dimension Transform, even though it has existed with that name for years.

Todd was classy about it, had a contest to rename it, and came up with Dimension Merge SCD. Since then, another pillar of the SQL Community, Brian Knight (Blog/Twitter), has picked up the ball and is integrating the transform into Pragmatic Works Task Factory.

Anyway, last week I received the new The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2by the Kimball Group and ripped through the book at a blistering pace. I hit page 222 by the next day and this is what I found:

DOWNLOADS
The two most popular third-party tools for replacing the Microsoft SCD transform are:

Kimball Method SCD Transform, available for free download from www.codeplex.com/Kimballscd/ and developed by Todd McDermid. This transform offers more functionality than the Microsoft SCD transform, and performs much better. Edits are not destructive to downstream elements. This transform is not associated with the Kimball Group, though the developer followed Kimball published best practices in his design.

Um… what?

I have great respect for everyone involved including Ralph Kimball, that’s why I buy, read, and recommend his books. However, the fact that the request was made at nearly the same time this new book was published was too much to pass up.

Drawing a Blank. Dealing With Blank Values in SQL Server using NullIf and Coalesce

“Melodical Murray is a human musical instrument; he makes music by blowing his…blank” – Match Game

In the previous article I discussed the issues when working with Null-able fields in Joins and Where clauses. This article will deal with the problems of blank values.

What About Blanks?

Many systems forego the use of Null-able fields, but instead use blank values. In fact, blank values are a bigger problem in M2M than Nulls. A common problem is with M2M Product Class values. For those that don’t use M2M, Product Class is similar to a Product Category that happens to be tied to general ledger accounts. Unfortunately, not every record in sales and other tables has a Product Class value and the value is blank. If you weren’t aware of this, run the following query against your M2M Database.

SELECT DISTINCT ARI.fprodcl ProductClass
FROM   dbo.aritem ARI

The M2M Educational Database returns the following:

In order to demonstrate this for those who don’t use M2M, I’ll run the following against the Kung Fu Database.

UPDATE dbo.OrderDetails
SET    ItemDescription = ''
WHERE  orderid = 4
 
SELECT DISTINCT OD.ItemDescription
FROM dbo.OrderDetails OD


So, how can we deal with the blank value (and the Null value as well?) Well, we could use a Case statement.

 SELECT DISTINCT
       CASE
         WHEN OD.ItemDescription IS NULL THEN '..Null None'
         WHEN OD.ItemDescription = '' THEN '..Blank None'
         ELSE OD.ItemDescription
       END
         AS ItemDescription
  FROM dbo.OrderDetails OD


Noticed that I assigned the value to Null and blank separately so you could see that it worked. Why did I use two periods before the text? I’ll explain that in a minute. The Case When works, but it’s wordy. There’s a much easier way to deal with both Blanks and Nulls.

NullIf and Coalesce

Let’s start with the NULLIF function. NULLIF accepts two arguments and if the arguments are the same, then Null is returned, otherwise the first value is returned. Therefore, we can detect blanks with the following:

 
SELECT NULLIF ('A', '')
 -- Returns 'A'
SELECT NULLIF ('', '') 
-- Returns Null
Declare @Test Char(5)
SELECT NULLIF (@Test, '') 
-- Returns Null
-- I had to use a variable for the last test because the first argument must 
-- have a known data type or an error will result.

The second argument in this case is always ” (blank). Why? If the field value is blank, the end result will be Null. Also, if the first argument is Null, that will return as Null as well. Make sense?

A Coalesce statement can then be used to replace the resulting Null value like so:

SELECT COALESCE(NULLIF ('A', '') , '..None') 
-- Returns 'A'
SELECT COALESCE(NULLIF ('', '') , '..None') 
--Returns '..None' 
Declare @Test Char(5)
SELECT  COALESCE(NULLIF (@Test, '') , '..None') 
-- Returns '..None'

Let’s return to our original statement and substitute this method.

SELECT DISTINCT
       coalesce (nullif (OD.ItemDescription, ''), '..None')
         AS ItemDescription
  FROM dbo.OrderDetails OD

I have this process programmed as a Toad Toad for SQL Server snippet. When I have time I’ll record a video showing how to use them. In this case, all I have to do is highlight the database field name, click the snippet, and Toad does the rest. So, why do I substitute text with leading periods? I do that primarily for generating available values in SSRS parameters.

When I go back and edit the first script (for M2M Product Classes) with this method, I get the following:

SELECT DISTINCT
       ARI.fprodcl ProductClass
      ,coalesce (nullif (ARI.fprodcl, ''), '..None') FixedProductClass
  FROM dbo.aritem ARI
ORDER BY coalesce (nullif (ARI.fprodcl, ''), '..None')


Which would your users rather see in a SSRS parameter drop down?

It’s important to note that blank (”) is an actual value, which is in fact equal to another blank, and can be used to join as well. However, in the case of Made2Manage, you shouldn’t expect to join your tables to the Product Class Table (INProd) with an Inner Join because you’ll “lose” records. That’s why I always link to INProd with a Left Outer Join. Would you like to see the invoice records that you’d lose in an Inner Join? Run the following against your database to find out.

SELECT ARM.fbcompany AS BillToCompany
      ,ARI.fcinvoice AS InvoiceNo
      ,ARI.fitem AS ItemNo
      ,ARI.fprodcl AS ProductClass
      ,rtrim (ARI.fpartno) AS PartNo
      ,ARI.frev AS PartRev
      ,ARI.ftotprice AS NetExtInvoiced
  FROM   dbo.armast ARM
       INNER JOIN
         dbo.aritem ARI
       ON ARM.FCINVOICE = ARI.FCINVOICE
 WHERE ARM.fcstatus <> 'V' AND
 
       coalesce (nullif (ARI.fprodcl, ''), '..None') = '..None' -- Returns Blanks and Nulls

In the M2M Educational Database, 19 records are returned.

So, in these two articles, I’ve demonstrated the problems of dealing with Nulls and Blanks in your database and how to deal with them. In the future, I’ll follow up with articles on creating and using Date tables and other more advanced T-SQL techniques.

Stay tuned!