SQL University - Resistance is Futile

Note: This is the last in a series of five “lessons” which I created for SQL University on the topics of Dimensional Modeling and Data Warehousing. Click here to read the rest of the series.

I can haz assimilation?

We are Borg – Bringing it all together.

Now that we understand the basics of Dimension and Fact Tables, let’s learn how they work together. Once again, don’t forget that the Kimball Toolkit Data Modeling Spreadsheet is very helpful when creating these models.

The Star Schema

Let’s go back to the dimensional model Fact Table mentioned in the last article.

Remember that Fact Tables consist primarily of foreign keys and measures.

These foreign keys link to the surrogate primary keys of the Dimension Tables. When you link them all together, they might look like this.

So, what does that look like? The tables are generally laid out in the formation of a Star. That’s all a Star Schema really is.

Snowflake Schema

Let’s say that Product Category is related to the Product Dimension in a one to many relationship (or mostly so), and that it has many verbose attributes. The architect may decide to split it off from the Product Dimension. The resulting schema would look like this.

This is now a Snowflake Schema. By the way, check out the header of the third section of the linked wiki. Is the wording intentionally hilarious? Another common use for this kind of schema would be for a Customer Dimension and associated addresses. The location becomes a Dimension Table linked to DimCustomer. A general best practice is to use Star Schemas unless there is a very important reason to move to Snowflake as they are simpler to report from and load. If you use a Snowflake Schema, you may want to cover the tables with a view to simplify querying. In our case, we’d have a view for DimProduct and that would include the DimProductCategory and appear as one entity.

The Simple Life of Borgs

Here’s the payoff. Let’s assume that your boss wants to know the sales order amount by year and product. You can ignore all of the other dimensions.

The query is simple and fast because it involves 3 fields and 2 joins. Those joins are primary and foreign keys which means most query products recognize them automatically. Life is now simple.

Prepare to be Assimilated – Convincing Your Boss

So, I can hear you saying, “Alright Dave, you’ve convinced me, how can I get my boss to let me work on something like this?”

If you’ve been reading this series of articles, you know that creating a data warehouse is not quick or easy. It involves a great deal of time to learn to design and load a data warehouse. So, how can you convince your boss to let you pursue this?

Well, to start with business people are just like everyone else. They’re short sighted, demanding nearly instant return on investment (ROI), and are averse to risk.

What are you preparing?! You’re always preparing! Just go! – Dark Helmet. Spaceballs

The first thing to remember is that this project can’t be about you. In most cases, your boss doesn’t care about what you want to do, but rather what you can do to add value to the business. Also, in today’s economy your boss may be reluctant to send you to dimensional modeling classes which cost thousands of dollars. So, how can you learn these principals on the cheap? I’ll cover that in a moment.

Pitching the idea to your boss can be difficult because dimensional modeling concepts are intangible. It’s hard for a business person to see the value in it. You have to relate the project to revenue in some way.

So, I suggest that you create a proof of concept project, perhaps in your free time. After reading this series of lessons, pick a business topic and create a test project. Some important things to keep in mind in choosing this topic are:

  1. Choose a business topic that is of value to the business. When your boss sees value, even as just a test case, he/she will be more likely to green light it.
  2. To maximize your chances of success, choose a business topic with a single clean data source if possible. Don’t bite off some project that involves flat files that are FTP’ed to you and often contain dirty or missing data. Pick a relatively clean OLTP system, preferably in SQL Server.
  3. The proof of concept need not be perfect or complete. Create a single Fact Table, and a few Dimensions to go with it.
  4. Use your existing skill set to create and load the data warehouse. While SQL Server Integration Services is the tool of choice for the extract, transform, and load process (ETL); if you don’t know how to use it, load it with T-SQL instead. You can learn SSIS later.

When pitching the idea to your boss, show the model and demonstrate how he/she can perform their own analysis on it. Stress how much more convenient this is for the users, not how much time it will save you. Also, explain that this process can answer long standing business questions the boss may have. In my situation, Made2Manage cannot track sales order history. Through the ETL process of the data warehouse project, I can now track value changes in sales orders in near real time. This functionality saves my company money every day in that users no longer have to track those changes manually.

Learning on the Cheap

Books are one of the most cost effect ways to learn this. Below are a few books I recommend and why.

There are also free webinars from time to time on the subject from the Kimball Group, as well as Pragmatic Works. Pragmatic offers excellent classes in the various Microsoft technologies involved, and I’ve taken nearly all of them. To my knowledge they don’t offer a class in dimensional modeling itself. Also, you can follow blogs like this one. In future articles I intend to document all phases of my open source project from modeling, to ETL, and then reporting on the data warehouse. It should serve as a good case study for others to learn from.

One last thing that I want to stress about learning these techniques. It is absolutely imperative that you learn by doing it. Create a Dimensional Model and attempt to load it. If you don’t have a data source, look for one. Volunteer with a charity, many of them would love to leverage your slave labor to better their “business.”

Get involved with your local SQL Server User Group and make connections. Your progress is only limited by your level of commitment.

So, will you be assimilated?

3 comments to SQL University – Resistance is Futile

  • […] This post was mentioned on Twitter by Stray__Cat and SQL University, David Stein. David Stein said: New Blog: SQL University – Resistance is Futile Data Warehousing Week #sqluniversity #sqlpass […]

  • Leonard

    Thanks for the series of posts. I particularly liked the proof of concept suggestions – it’s very motivating!

  • Matt

    Thank you for the good read! These articles have given me a lot to think on as I am trying to grasp these concepts. My major challenge is how to take what these tutorials/blogs explain and apply to my real situation. I have been on the hunt for a good tutorial starting with source creation using common data found (like a ticket system, wink wink 🙂 ). I don’t like the tutorials using Microsoft prebuilt adventure works databases because the data means nothing to me (I can’t relate to it). I am fairly new to the database world having been a helpdesk tech and jr. web developer for many years prior. I was recently thrown into the DBA role and solely focused on administration and of course now they want BI, Argh!!

    I like how you broke up dim and fact and explained these very informative. I am going to review the DW toolkit as it might help me understand the dim and fact decision process better. Since I can’t really start to play until I understand what my dim and fact setup is. I am being asked to convert existing custom metric reports from our IT ticket system into a Cube for better more reliable analysis.


    Why? What are we going to do tomorrow night? –Pinky and the Brain

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>