SQL Saturday Dallas Reflections

SQL Saturday Dallas, which was our third, went off without a hitch. While there were a few items we’ve identified as needing work, overall the event was swimmingly successful.

Kevin Kline - Troubleshooting and Performance Tuning Microsoft SQLServer with Dynamic Management Views (DMVs)

The festivities started on Friday with our three Pre-Con events. Kevin Kline (Blog/Twitter), Jessica Moss (Blog/Twitter), and Randy Dyess from Solid Quality Mentors.

I attended parts of each and impressed by all of them. For the price you couldn’t beat the educational value of the North Texas SQL Server User Group Pre-Cons.

After the Pre-Cons the core volunteers went to the Region 10 Educational Center (great facility by the way) to set up for the event.

The blind leading the blind.

A special thanks to Heather Sullivan (Twitter), Head of SQL Server Tools for Idera Software, for bringing so many people to volunteer. I think the total number was 8 and those folks worked their butts off.

Natural Born Leader

Have I mentioned lately how much I love SQL Saturday Events? I’d love to be able to present more often. Attention SQL Vendors and Sponsors, that was a hint. 🙂

After that we went to the speaker’s dinner, which was held at Dave and Buster’s and a great time was had by all. Where else can you spend hours talking to Steve Jones (Blog/Twitter), Jessica Moss (Blog/Twitter), Andy Warren (Blog/Twitter), Wes Brown (Blog/Twitter), etc.

Scott Whigham and I playing pool. If you squint hard, I almost look like Tom Cruise in The Color of Money.

SQL Saturday always starts early when you are a volunteer. One of my responsibilities was registration, and everything went well. The success is mostly due to having so many committed volunteers, and they made me look good once again.

Throughout the day, things continued to go well. Speaker shirts were well received, the food and the Paciugo Gelato (fancy ice cream) were a big hit.

I want to give special thanks to a few people and groups. I apologize for leaving anyone out.

  • I really appreciate the sponsors who make these events possible.
  • Tim Mitchell (Blog/Twitter) who was not only a core volunteer and a presenter, but took the pictures as well.
  • All of the volunteers, especially our core team.
  • All of the presenters, many of whom traveled to speak at the event at their own cost.

Because I was so busy, I only attended a couple of presentations. Sean McCown (Blog) presented “How to Curse in an Interview” and it was one of the most intense talks I’ve ever seen in a technical setting. I also watched Sri Sridhara (Blog/Twitter), NTSSUG President, present “Turbo charge your DBA career” and he had the crowd mesmerized.

I presented “T-SQL Programming to Survive a Zombie Database” and it went very well. I love teaching, and I think it shows. In fact, Jen McCown (Blog/Twitter), who’s opinion I greatly respect, attended and tweeted the following:


If you missed the session, I will be presenting an extended version of it at the NTSSUG Meeting next Thursday 10/21 and the downloads for the session have been posted here.

Of course I went to the party after the event and spent quality time with community members. Everyone should attend the after parties. There’s a wealth of help available there for free, not to mention those folks are fun to hang out with as well. Where else can you ask for free help like this?

  • Adam Saxton (Blog/Twitter), Senior Escalation Engineer for SQL Server Support, about SSRS problems.
  • Steve Jones or Andy Warren about branding.
  • Tim Mitchell or Sean McCown about SSIS.
  • Wes Brown about SQL Server storage.
  • Jessica Moss about Microsoft BI.
  • Tim Costello (Blog/Twitter) about data visualization and Tableau.
  • Trevor Barkhouse (Blog) about performance tuning.
  • Ryan Adams (Blog/Twitter) about mirroring.
  • Kevin Kline about… well everything.

These events are free sources of SQL education, but that’s not the most important aspect. They’re about people, great people who share your profession and interests. How did I meet all of these great people? I joined PASS, participated in my local SQL server user group, and went the extra mile to be social.

You can do it too. Throw caution to the wind and talk to people, we don’t bite. Well, I’m not so sure about Sean.

SQL Saturday Chicago Highlights

It’s been a crazy few weeks. As I mentioned previously, I presented at two SQL Saturday Events in back to back weeks. I’m on the core committee for the North Texas SQL Server User Group SQL Saturday, so I was overwhelmed with things to do and couldn’t stop to blog.

So, now that I’ve taken a few days to catch up and relax I’ll share some of the highlights of both experiences.

SQL Saturday Chicago

I had a great time at this event put on by both the Chicago SQL Server User Group and the Chicago Suburban SQL Server User Group. I flew in early Friday to attend my second Freecon event. Freecon is a free event put on by Brent Ozar (Blog/Twitter) in which he invites a small group of bloggers/presenters from the SQL Community to discuss issues important to us and the community. My Freecon experiences have been excellent, and I’ll go into those in more depth in a future post. I simply can’t thank Master O enough for including me. Oh, and the atmosphere at the The Catalyst Ranch was amazing too.

So, why mention it? Well, I learned an important lesson. I am never, ever flying out at 5:30am again. I was so exhausted after getting up at 3am that I never caught up. SQL Saturday events are packed with activity, especially if you’re volunteering or speaking. After the Freecon, Erin Stellato (Blog/Twitter) was kind enough to cart Karen Lopez aka DataChick (Site/Blog/Twitter), her hubby Rob Drysdale, and I to and from our hotel which was a bit of a distance from SQL Saturday.

SQL Sentry hosted the speaker’s dinner. The food was excellent and there were so many amazing database people to meet including Aaron Bertrand (Twitter) who works for SQL Sentry. SQL Saturday Chicago provided personalized shirts for the speakers with our names on the front. It was a classy touch which I intend to suggest to my local group. The problem with personalized shirts is the risk of typos.

Thomas LaRock (Blog/Twitter) was a good sport about being called RockSar all weekend.

I got up bright blurry eyed Saturday morning, scarfed some breakfast, and headed to the event. It was obvious that this was not the Chicago groups’ first rodeo as they knew what they were doing. Wendy Pastrick (Blog/Twitter) and her group had everything running smoothly. Noel McKinney (Blog/Twitter), whom I had met on Friday, had me through the check in process and I never saw a line up.

Unfortunately I didn’t attend many presentations because I was having laptop problems and had to re-build part of my demo. This was my second important lesson of the trip. I need better contingency plans so I’ve purchased VMWare Workstation and every one of my demos will be done that way. I’ll keep a backup copy of the VM on a USB hard drive so I can use another laptop at a moment’s notice.

I did see Grant Fritchey (Blog/Twitter) present “The Optimizer, Statistics, and Indexes.” Honestly, I think everyone saw him. The room was packed so full that people were standing along the walls and SQL Rockstar sat right next to him. Grant is amazing, if you ever get the chance to see him present, do not pass it up. In the speaker’s room after the presentation, we were laughing because one of the attendees commented that the “Sidekick in the corner was distracting” because Tom and Grant bantered back and forth a bit during his presentation. Hilarious.

Grant took this short video of me in the speaker’s room. It should be worth a laugh.

An old friend came to see me at the event. She used to work for Made2Manage support and she’s one of the many people I’ve encouraged to get out into the much larger world of SQL Server. Anyway as she and I were talking in the hallway, Grant Fritchey walked by and I introduced him as “The Scary DBA.” My friend scoffed and said that when she worked in support my phone calls would sometimes bring her to tears. Grant thought it was hilarious, but I was a wee bit embarrassed.
My presentation went well, and I had more than 40 attendees. Not bad considering I was scheduled at the same time as Brent Ozar, Datachick, Jes Borland, and others.

As promised the downloadable scripts and presentation can be downloaded here.

I learned something else from this trip. I normally take a lot of time creating geeky/funny Abstracts and Titles like “Data Warehousing – How to convince ‘The Bobs’.” However, when I asked how many people had seen “Office Space,” which is the underlying theme of the presentation, less than 10% had seen it. Ugh. I’m going to have to re-think my Title and Abstract Strategy and go with something closer to “Creating a Data Warehouse Proof of Concept” which has more widespread appeal.

I went to the after party and had a great time talking with Jose Chinchilla (Blog/Twitter) as the dude is hilarious. I hope to get him back down to Dallas to present again soon. I also succumbed to peer pressure and had a drink with Jose, Datachick + 1, and Noel McKinney and his wife. Apparently I get really funny even after having one drink. Anyway, I ended up at SQL Karoake, but wasn’t up to embarassing myself like I did at PASS last year, so I just watched with Datachick + 1.

I will tell you that SQL People let their hair down like no other at SQL Karoake. Jes Borland (Blog/Twitter), who has more energy than any three people I know, and Jason Strate (Blog/Twitter) belted out a duet and everyone had a good time.

I have no idea why more people don’t attend the after parties. It’s fun, and a great way to network with some of the best in the SQL profession.

What I want you to take from this is simple. Attending SQL Saturday Events is a lot of fun and great for your career. Take full advantage of them, including the after party.

Come Join Us.

Meme Monday - SSIS Frustrations

One of the cool kids, Thomas LaRock (Blog/Twitter), has started a new community blogging game called Meme Monday. My friend John Sterrett (Blog/Twitter) tagged me this morning, and my attempt is below.

SSIS work is not good for your self-esteem on Monday morning.

These things spread like wildfire, so I’m not sure who’s been tagged and who isn’t, but here goes.

I hereby tag Jorge Segarra (Blog/Twitter) and Jen McCown (Blog/Twitter).

Have fun folks.

Come See Me Present at SQL Saturday Chicago and Dallas

I’ll be presenting at two SQL Saturdays back to back. You are taking advantage of SQL Saturday events, right?

SQL Saturday Chicago is this weekend (3/26), and I’ll be presenting:

Data Warehousing – How to Convince “The Bobs”

I got a meeting with the Bobs in a couple of minutes...

Building your first Data Warehouse is a long, and often difficult process. How can you get your boss to approve a Data Warehouse project? What’s the best way to explain dimensional modeling and the benefits of a Data Warehouse to a business person? What are the best/most cost effective ways of learning it? What kind of materials, hardware, software, etc do you need? What’s the best way to build a proof of concept that will impress your boss, as rapidly as possible?

Come to this presentation, and I’ll answer all of these questions plus the most important question of all.

“Is this good for the company?” Absolutely.

The following Saturday (4/2) my home group, the North Texas SQL Server User Group, will be hosting our 3rd SQL Saturday. I’ll be presenting the following:

T-SQL Techniques to Survive Zombie Databases

Do you have a Zombie Database? Does your database suffer from roaming hordes of problems like the lack of primary and foreign keys, poor normalization, in place updates and deletes, and an insatiable drive to eat your braaaaaaaaaaaaaains (CPU cycles).

Fear not fellow survivor for in this session, I’ll show you how to fortify your code against zombie infestation. How to deal with the unknown (nulls), how to Restore Order by Joining together in Groups. Flak Jackets are Optional, but encouraged.

So, participate in SQL Saturdays. Present, volunteer, or just attend. They’re excellent, free sources of SQL Education.

Isn't it Ironic? Don't You Think?

Anyone who reads this blog should know that I’m a fan of Todd McDermid and his free Kimball Slowly Changing Dimension Transform. For those of you who’ve not been following his blog, he was contacted by the Kimball Group and informed that he needed to stop using Kimball’s name for his free transform, the Kimball Method Slowly Changing Dimension Transform, even though it has existed with that name for years.

Todd was classy about it, had a contest to rename it, and came up with Dimension Merge SCD. Since then, another pillar of the SQL Community, Brian Knight (Blog/Twitter), has picked up the ball and is integrating the transform into Pragmatic Works Task Factory.

Anyway, last week I received the new The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2by the Kimball Group and ripped through the book at a blistering pace. I hit page 222 by the next day and this is what I found:

The two most popular third-party tools for replacing the Microsoft SCD transform are:

Kimball Method SCD Transform, available for free download from and developed by Todd McDermid. This transform offers more functionality than the Microsoft SCD transform, and performs much better. Edits are not destructive to downstream elements. This transform is not associated with the Kimball Group, though the developer followed Kimball published best practices in his design.

Um… what?

I have great respect for everyone involved including Ralph Kimball, that’s why I buy, read, and recommend his books. However, the fact that the request was made at nearly the same time this new book was published was too much to pass up.

Drawing a Blank. Dealing With Blank Values in SQL Server using NullIf and Coalesce

“Melodical Murray is a human musical instrument; he makes music by blowing his…blank” – Match Game

In the previous article I discussed the issues when working with Null-able fields in Joins and Where clauses. This article will deal with the problems of blank values.

What About Blanks?

Many systems forego the use of Null-able fields, but instead use blank values. In fact, blank values are a bigger problem in M2M than Nulls. A common problem is with M2M Product Class values. For those that don’t use M2M, Product Class is similar to a Product Category that happens to be tied to general ledger accounts. Unfortunately, not every record in sales and other tables has a Product Class value and the value is blank. If you weren’t aware of this, run the following query against your M2M Database.

SELECT DISTINCT ARI.fprodcl ProductClass
FROM   dbo.aritem ARI

The M2M Educational Database returns the following:

In order to demonstrate this for those who don’t use M2M, I’ll run the following against the Kung Fu Database.

UPDATE dbo.OrderDetails
SET    ItemDescription = ''
WHERE  orderid = 4
FROM dbo.OrderDetails OD

So, how can we deal with the blank value (and the Null value as well?) Well, we could use a Case statement.

         WHEN OD.ItemDescription IS NULL THEN '..Null None'
         WHEN OD.ItemDescription = '' THEN '..Blank None'
         ELSE OD.ItemDescription
         AS ItemDescription
  FROM dbo.OrderDetails OD

Noticed that I assigned the value to Null and blank separately so you could see that it worked. Why did I use two periods before the text? I’ll explain that in a minute. The Case When works, but it’s wordy. There’s a much easier way to deal with both Blanks and Nulls.

NullIf and Coalesce

Let’s start with the NULLIF function. NULLIF accepts two arguments and if the arguments are the same, then Null is returned, otherwise the first value is returned. Therefore, we can detect blanks with the following:

 -- Returns 'A'
-- Returns Null
Declare @Test Char(5)
SELECT NULLIF (@Test, '') 
-- Returns Null
-- I had to use a variable for the last test because the first argument must 
-- have a known data type or an error will result.

The second argument in this case is always ” (blank). Why? If the field value is blank, the end result will be Null. Also, if the first argument is Null, that will return as Null as well. Make sense?

A Coalesce statement can then be used to replace the resulting Null value like so:

SELECT COALESCE(NULLIF ('A', '') , '..None') 
-- Returns 'A'
SELECT COALESCE(NULLIF ('', '') , '..None') 
--Returns '..None' 
Declare @Test Char(5)
SELECT  COALESCE(NULLIF (@Test, '') , '..None') 
-- Returns '..None'

Let’s return to our original statement and substitute this method.

       coalesce (nullif (OD.ItemDescription, ''), '..None')
         AS ItemDescription
  FROM dbo.OrderDetails OD

I have this process programmed as a Toad Toad for SQL Server snippet. When I have time I’ll record a video showing how to use them. In this case, all I have to do is highlight the database field name, click the snippet, and Toad does the rest. So, why do I substitute text with leading periods? I do that primarily for generating available values in SSRS parameters.

When I go back and edit the first script (for M2M Product Classes) with this method, I get the following:

       ARI.fprodcl ProductClass
      ,coalesce (nullif (ARI.fprodcl, ''), '..None') FixedProductClass
  FROM dbo.aritem ARI
ORDER BY coalesce (nullif (ARI.fprodcl, ''), '..None')

Which would your users rather see in a SSRS parameter drop down?

It’s important to note that blank (”) is an actual value, which is in fact equal to another blank, and can be used to join as well. However, in the case of Made2Manage, you shouldn’t expect to join your tables to the Product Class Table (INProd) with an Inner Join because you’ll “lose” records. That’s why I always link to INProd with a Left Outer Join. Would you like to see the invoice records that you’d lose in an Inner Join? Run the following against your database to find out.

SELECT ARM.fbcompany AS BillToCompany
      ,ARI.fcinvoice AS InvoiceNo
      ,ARI.fitem AS ItemNo
      ,ARI.fprodcl AS ProductClass
      ,rtrim (ARI.fpartno) AS PartNo
      ,ARI.frev AS PartRev
      ,ARI.ftotprice AS NetExtInvoiced
  FROM   dbo.armast ARM
         dbo.aritem ARI
 WHERE ARM.fcstatus <> 'V' AND
       coalesce (nullif (ARI.fprodcl, ''), '..None') = '..None' -- Returns Blanks and Nulls

In the M2M Educational Database, 19 records are returned.

So, in these two articles, I’ve demonstrated the problems of dealing with Nulls and Blanks in your database and how to deal with them. In the future, I’ll follow up with articles on creating and using Date tables and other more advanced T-SQL techniques.

Stay tuned!

I Don’t Know! Deal With It! (Nulls)

This topic has been done to death in many places, but I still get frequent questions about Null field issues in T-SQL. My purpose in this article is not to debate the definition of Null. Smarter men than I have already done so.

SQL Server Books Online indicates “A value of NULL indicates that the value is unknown.” However, some disagree, most recently Joe Webb – Null is not Unknown. Some argue that there should be several different types of Null.

Why Ask Why?

As I said before, I don’t really care about the various interpretations of the Null value. I just know that they’re a pain to deal with. I have never received questions like this:

  • What exactly is a Null?
  • What’s the definition of Null?

The questions I get follow along these lines.

  • Why am I losing records in this join?
  • Why is my total (or average) off in this query?
  • Why does my SSRS parameter list look funny?
  • Why won’t my parameters default like they should?

So dear reader, as part of the T-SQL Basics Series, I am going to share what you need to know about dealing with Nulls rather than defining them. If you’d like to follow along, please download the Kung Fu Database Script.

Where Clause

To start with, while people disagree about the definition of Null, everyone agrees what Null is not. It is not any of the following:

  • 0
  • blank
  • ” (Empty String)
  • 1/1/1900

When most of us look at a Where clause, we only consider two options – True and False. However, SQL Server uses three value logic, True, False, and Unknown. Since Null represents an Unknown value any comparison to it results in Unknown, not True or False. Consider the following pseudo-code:

Select * from Employees where Salary > 500

If the Salary field doesn’t allow nulls then it’s very easy to return the list of Employees which earn more than $500 per some time period. However, if Salary is a Nullable field, then any Employee without a Salary specification will not be returned. How could it be? You simply do not know whether they make more than $500 or not.

I find the easiest way for me to learn these concepts is to look at examples. Let’s start with the following base query from the Kung Fu Database.

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD

Null values are hightlighted.

Returns: 6 Records. This is the total data set.

Now consider the following:

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
WHERE  OD.OrderQty >= 1

Returns: 4 Records. Those with Null Values for OrderQty are ignored.

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
WHERE  OD.OrderQty < 1

Returns: 0 Records. Comparing any value to Null results in Unknown and those those records are not returned.

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD

Well, logically this should return 2 records, right? After all, you are asking for those records with Null values for OrderQty. However, the correct answer is 0 records. The reason is simple. Comparing any value to Null (even another Null) returns Unknown. This may seem confusing, but logically it makes sense. Consider my Salary example above. If Bob and Dave both have Null Salary values, do they earn the same amount? Obviously, there’s no way to know. Therefore, the query evaluates to Unknown for those records and they aren’t returned. So, how can we return the Null records?

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD

Returns: 2 Records. For the same reason, we could have used “IS NOT NULL” and 4 records would return.

So, what if we want to return any records with a quantity less than 1 and include Null Values? Well, you could write that query like this:

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
WHERE  OD.OrderQty < 1 

Returns: 2 Records.

Replacing Null Values with ISNULL or COALESCE

So that works, but it’s wordy. How can we more easily deal with nulls? Well, we can replace them with an actual value by using COALESCE or ISNULL. Each function returns the first Non-Null value found in the list of arguments. For example, SELECT ISNULL(NULL, 25) returns 25. However, SELECT ISNULL(NULL, NULL) returns Null.

Some people prefer IsNULL because the name of the function is more understable to them and it has been shown to have a very slight performance advantage. However, I use Coalesce because IsNull is limited to 2 arguments, whereas COALESE doesn’t have any realistic limit. SELECT COALESCE(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 7) still returns 7. Remember that it returns the first non-null value. Therefore, SELECT COALESCE(NULL, 3, NULL, NULL, NULL, 7) only returns the first value which is 3.

So, you may use Coalesce to simply these comparisons as well as for aesthetic reasons. Consider the previous Kung Fu Database query re-written with Coalesce. I’ve formatted it vertically to emphasize the difference.

  dbo.OrderDetails OD
  COALESCE(OD.OrderQty,0) < 1

Who determines whether or not you should replace the field like this? For the most part, the business user. So, if the business user provides the proper substitution values, the query may be written like this:

 ,COALESCE(OD.ProductID, 0) ProductID
 ,COALESCE(OD.ItemDescription, 'None') ItemDescription
 ,COALESCE(OD.OrderQty, 0) OrderQty
 ,COALESCE(OD.Discount, 0) Discount
 ,COALESCE(OD.ListPrice, 0) ListPrice
FROM dbo.OrderDetails OD

Joining on NULL-able Columns

So, I’ve demonstrated how Null values can cause problems in the Where clause. For the same reasons, joining on Null-able columns is problematic as well. Consider my example Order Details table which allows a Null value for ProductID. Consider the following script:

    dbo.OrderDetails OD
    dbo.Products P
  ON P.ProductID = OD.ProductID

If that were written as your standard inner join, the recordset would lose the indicated record.

Grouping, Sorting, and Aggregates

In regards to Group By and Order By, Null values are considered equal. Meaning that all of the Null values are treated as one entity and are grouped and ordered as such.

Consider the following queries and result sets:

FROM   dbo.OrderDetails OD

SELECT   OD.OrderQty
FROM     dbo.OrderDetails OD

Aggregate Functions
Aggregate functions such as COUNT, SUM, and AVG ignore Null values. However, one can use COUNT(*) which will count all records. Consider the following query:

  SUM(OD.OrderQty) QtySum
  ,Avg(cast(OD.OrderQty as decimal (4,2))) NullAvg
  ,Avg(cast(COALESCE(OD.OrderQty,0) as decimal (4,2))) ZeroAvg 
 ,COUNT(OD.OrderQty) QtyCount
 ,COUNT(*) TotalCount
  dbo.OrderDetails OD

I had to Cast the quantities as decimals, because otherwise the values would have been rounded to 1. If you do the math, you’ll see that aggregate functions only considered the four records with non-null values. This is particularly noticeable when you compare the NullAvg and the ZeroAvg. The ZeroAvg actually divides the total by 7 instead of 4.

In this article, I’ve attempted through the use of many examples to describe the issues when working with Null values and how to deal with them. In the next article, I’ll discuss how to deal with blank values in Joins, Where clauses, and when creating parameter lists for SQL Server Reporting Services.

M-Data Analytics Questions

I’m sorry that my posts haven’t been as regular as I’d like, but the good news is that we’re making great progress with the M-Data Analytics open source project. In order to provide the best benefit to my fellow Made2Manage users, please answer the following questions.

What Version(s) of Made2Manage are you currently using?

View Results

Loading ... Loading ...

What version(s) of SQL Server are you currently using?

View Results

Loading ... Loading ...

How many live M2M Companies do you have?

View Results

Loading ... Loading ...

If you have more than 1 live company, which setup would you prefer?

View Results

Loading ... Loading ...

Are you running more than 1 version of M2M in production?

View Results

Loading ... Loading ...

When are you likely to purchase a new SQL Server?

View Results

Loading ... Loading ...

M-Data Analytics has the ability to notify separate Data Stewards regarding data problems for each business topic. For example, a designated sales person would be notified if sales records were not up to par giving them the ability to address those issues. Perhaps a new sales address contains an unrecognized value for State Name, etc. Would you actually utilize this feature?

Also, are any of you using naming conventions for your M2M databases other than the standard M2MDataXX where XX represents a two digit number? If so, how are you naming your M2M databases?

I really appreciate your input.

SQL University - Resistance is Futile

Note: This is the last in a series of five “lessons” which I created for SQL University on the topics of Dimensional Modeling and Data Warehousing. Click here to read the rest of the series.

I can haz assimilation?

We are Borg – Bringing it all together.

Now that we understand the basics of Dimension and Fact Tables, let’s learn how they work together. Once again, don’t forget that the Kimball Toolkit Data Modeling Spreadsheet is very helpful when creating these models.

The Star Schema

Let’s go back to the dimensional model Fact Table mentioned in the last article.

Remember that Fact Tables consist primarily of foreign keys and measures.

These foreign keys link to the surrogate primary keys of the Dimension Tables. When you link them all together, they might look like this.

So, what does that look like? The tables are generally laid out in the formation of a Star. That’s all a Star Schema really is.

Snowflake Schema

Let’s say that Product Category is related to the Product Dimension in a one to many relationship (or mostly so), and that it has many verbose attributes. The architect may decide to split it off from the Product Dimension. The resulting schema would look like this.

This is now a Snowflake Schema. By the way, check out the header of the third section of the linked wiki. Is the wording intentionally hilarious? Another common use for this kind of schema would be for a Customer Dimension and associated addresses. The location becomes a Dimension Table linked to DimCustomer. A general best practice is to use Star Schemas unless there is a very important reason to move to Snowflake as they are simpler to report from and load. If you use a Snowflake Schema, you may want to cover the tables with a view to simplify querying. In our case, we’d have a view for DimProduct and that would include the DimProductCategory and appear as one entity.

The Simple Life of Borgs

Here’s the payoff. Let’s assume that your boss wants to know the sales order amount by year and product. You can ignore all of the other dimensions.

The query is simple and fast because it involves 3 fields and 2 joins. Those joins are primary and foreign keys which means most query products recognize them automatically. Life is now simple.

Prepare to be Assimilated – Convincing Your Boss

So, I can hear you saying, “Alright Dave, you’ve convinced me, how can I get my boss to let me work on something like this?”

If you’ve been reading this series of articles, you know that creating a data warehouse is not quick or easy. It involves a great deal of time to learn to design and load a data warehouse. So, how can you convince your boss to let you pursue this?

Well, to start with business people are just like everyone else. They’re short sighted, demanding nearly instant return on investment (ROI), and are averse to risk.

What are you preparing?! You’re always preparing! Just go! – Dark Helmet. Spaceballs

The first thing to remember is that this project can’t be about you. In most cases, your boss doesn’t care about what you want to do, but rather what you can do to add value to the business. Also, in today’s economy your boss may be reluctant to send you to dimensional modeling classes which cost thousands of dollars. So, how can you learn these principals on the cheap? I’ll cover that in a moment.

Pitching the idea to your boss can be difficult because dimensional modeling concepts are intangible. It’s hard for a business person to see the value in it. You have to relate the project to revenue in some way.

So, I suggest that you create a proof of concept project, perhaps in your free time. After reading this series of lessons, pick a business topic and create a test project. Some important things to keep in mind in choosing this topic are:

  1. Choose a business topic that is of value to the business. When your boss sees value, even as just a test case, he/she will be more likely to green light it.
  2. To maximize your chances of success, choose a business topic with a single clean data source if possible. Don’t bite off some project that involves flat files that are FTP’ed to you and often contain dirty or missing data. Pick a relatively clean OLTP system, preferably in SQL Server.
  3. The proof of concept need not be perfect or complete. Create a single Fact Table, and a few Dimensions to go with it.
  4. Use your existing skill set to create and load the data warehouse. While SQL Server Integration Services is the tool of choice for the extract, transform, and load process (ETL); if you don’t know how to use it, load it with T-SQL instead. You can learn SSIS later.

When pitching the idea to your boss, show the model and demonstrate how he/she can perform their own analysis on it. Stress how much more convenient this is for the users, not how much time it will save you. Also, explain that this process can answer long standing business questions the boss may have. In my situation, Made2Manage cannot track sales order history. Through the ETL process of the data warehouse project, I can now track value changes in sales orders in near real time. This functionality saves my company money every day in that users no longer have to track those changes manually.

Learning on the Cheap

Books are one of the most cost effect ways to learn this. Below are a few books I recommend and why.

There are also free webinars from time to time on the subject from the Kimball Group, as well as Pragmatic Works. Pragmatic offers excellent classes in the various Microsoft technologies involved, and I’ve taken nearly all of them. To my knowledge they don’t offer a class in dimensional modeling itself. Also, you can follow blogs like this one. In future articles I intend to document all phases of my open source project from modeling, to ETL, and then reporting on the data warehouse. It should serve as a good case study for others to learn from.

One last thing that I want to stress about learning these techniques. It is absolutely imperative that you learn by doing it. Create a Dimensional Model and attempt to load it. If you don’t have a data source, look for one. Volunteer with a charity, many of them would love to leverage your slave labor to better their “business.”

Get involved with your local SQL Server User Group and make connections. Your progress is only limited by your level of commitment.

So, will you be assimilated?

SQL University - Facts, Show Me The Money

Note: This is the fourth in a series of five “lessons” which I created for SQL University on the topics of Dimensional Modeling and Data Warehousing. Click here to read the rest of the series.

So, we’ve covered dimension tables and how they give your data meaning. That begs the question, how do we actually track the measures (value) of those business process events. In other words, where are we going to track the money, and some other measurements as well.

What are Fact Tables?

Fact Tables contain the measurements of your business process events usually in the form of money, quantities, and sometimes dates. The typical fact table contains nothing but numbers. Suppose we are designing a data warehouse for Acme Inc. maker of a variety of fine products, usually involving rockets, for sale to discerning coyotes everywhere. They have multiple stores throughout Arizona. Below is a sample fact table.


Whereas Dimension Tables tend to be denormalized or flattened, Fact Tables are highly normalized. For the most part, they consist of measurements and Keys that link to the surrogate keys of the Dimension Tables. A sample data set for the table above would look like this:

Fact Tables tend to be as “narrow” as possible because they are so much larger than the dimensions. The only keys which aren’t surrogates are the date table smart keys. Also, the measurements included in the Fact Table should be additive. Notice that I am not storing Unit Price. The user can calculate it from order quantity and extended price if need be. If you store a non-additive fact like unit price, users will try to analyze by it and get erroneous results.

Fact Table Grain

The most important consideration when designing Fact Tables is to set the grain. As Kimball states,

What, exactly, is the grain? The grain of a fact table is the business definition of the measurement event that creates a fact record. The grain is exclusively determined by the physical realities of the source of the data.

The grain should be at the most atomic level you can measure at because you never know how your users will want to slice and dice the data.

In my example, the grain is at the level of Sales Order Line Items. If we were a point of sale establishment like a grocery store, how would we define the grain? Paraphrasing Kimball, when the cashier scans your Cheetos and Bacon and you hear that “BEEP”, the “BEEP” is your grain of your fact table.

Do not violate the grain of your Fact Table, though you may be enticed to do so. For example, what if Acme wanted to add shipping costs to our Order Fact Table. Well, shipping costs are typically recorded per Sales Order, not per line item or individual product. We currently have this problem with M-Data Analytics.

You should not simply add a shipping cost value to your fact table. If a particular sales order cost $10 to ship and had 3 line items, how would you record that? You should not simply place $10 in each of the three fact table rows because it’s not additive. You shouldn’t include $10 for one of the lines and $0 for the others either. The only way to solve this is to estimate the cost per line item (perhaps by using weights in your product table) and use those values. That process could be prohibitively difficult however.

Fact Table Types

There are three types of Fact Tables.

Transactional A Transactional Fact Table is the fundamental type and the example above is Transactional. It should be at the most atomic grain possible, one row per transaction or business event. One row per “BEEP” as Kimball puts it. Since this table is at the most atomic level, they tend to have the greatest number of Dimension tables surrounding them.

Periodic Snapshots
A Periodic Snapshot represents a point in time representation of measurements. One common application is with measurements that cannot be historically tracked like current inventory or order backlog. Another is to summarize over periods of time, perhaps at end of month to track balances and such.

Accumulating Snapshot
An Accumulating Snapshot Fact Table is typically a series of snapshots in a single row where each row represents a business process with a definite beginning and end. For example, let’s say that Acme Inc.’s order process typically includes these steps.

  1. Product is Quoted to Customer
  2. Quote is accepted or denied
  3. Sales Order is generated if Quote is accepted
  4. Sales Order is Shipped
  5. Customer is Invoiced
  6. Payment is Processed

If management wants to measure throughput of orders, how fast they move through the system, an Accumulating Snapshot can be used. The grain would represent a particular sales order (or sales order line item if they are ever handled separately). Included columns would be the important dates, foreign keys to the relevant dimensions, and the quantity and dollar amount measurements.

So, now that we’ve discussed the basics of Dimension and Fact Tables, I’ll bring it all together in tomorrow’s lesson. We’ll discuss Star and Snowflake Schemas, and how this all works. I’ll also cover the best ways to learn this process as well as how to convince your company to allow you to undertake building a data warehouse.

Stay tuned!

Page 9 of 33« First...7891011...2030...Last »