We are Borg – Bringing it all together.
Now that we understand the basics of Dimension and Fact Tables, let’s learn how they work together. Once again, don’t forget that the Kimball Toolkit Data Modeling Spreadsheet is very helpful when creating these models.
The Star Schema
Let’s go back to the dimensional model Fact Table mentioned in the last article.
Remember that Fact Tables consist primarily of foreign keys and measures.
These foreign keys link to the surrogate primary keys of the Dimension Tables. When you link them all together, they might look like this.
So, what does that look like? The tables are generally laid out in the formation of a Star. That’s all a Star Schema really is.
Let’s say that Product Category is related to the Product Dimension in a one to many relationship (or mostly so), and that it has many verbose attributes. The architect may decide to split it off from the Product Dimension. The resulting schema would look like this.
This is now a Snowflake Schema. By the way, check out the header of the third section of the linked wiki. Is the wording intentionally hilarious? Another common use for this kind of schema would be for a Customer Dimension and associated addresses. The location becomes a Dimension Table linked to DimCustomer. A general best practice is to use Star Schemas unless there is a very important reason to move to Snowflake as they are simpler to report from and load. If you use a Snowflake Schema, you may want to cover the tables with a view to simplify querying. In our case, we’d have a view for DimProduct and that would include the DimProductCategory and appear as one entity.
The Simple Life of Borgs
Here’s the payoff. Let’s assume that your boss wants to know the sales order amount by year and product. You can ignore all of the other dimensions.
The query is simple and fast because it involves 3 fields and 2 joins. Those joins are primary and foreign keys which means most query products recognize them automatically. Life is now simple.
Prepare to be Assimilated – Convincing Your Boss
So, I can hear you saying, “Alright Dave, you’ve convinced me, how can I get my boss to let me work on something like this?”
If you’ve been reading this series of articles, you know that creating a data warehouse is not quick or easy. It involves a great deal of time to learn to design and load a data warehouse. So, how can you convince your boss to let you pursue this?
Well, to start with business people are just like everyone else. They’re short sighted, demanding nearly instant return on investment (ROI), and are averse to risk.
What are you preparing?! You’re always preparing! Just go! – Dark Helmet. Spaceballs
The first thing to remember is that this project can’t be about you. In most cases, your boss doesn’t care about what you want to do, but rather what you can do to add value to the business. Also, in today’s economy your boss may be reluctant to send you to dimensional modeling classes which cost thousands of dollars. So, how can you learn these principals on the cheap? I’ll cover that in a moment.
Pitching the idea to your boss can be difficult because dimensional modeling concepts are intangible. It’s hard for a business person to see the value in it. You have to relate the project to revenue in some way.
So, I suggest that you create a proof of concept project, perhaps in your free time. After reading this series of lessons, pick a business topic and create a test project. Some important things to keep in mind in choosing this topic are:
- Choose a business topic that is of value to the business. When your boss sees value, even as just a test case, he/she will be more likely to green light it.
- To maximize your chances of success, choose a business topic with a single clean data source if possible. Don’t bite off some project that involves flat files that are FTP’ed to you and often contain dirty or missing data. Pick a relatively clean OLTP system, preferably in SQL Server.
- The proof of concept need not be perfect or complete. Create a single Fact Table, and a few Dimensions to go with it.
- Use your existing skill set to create and load the data warehouse. While SQL Server Integration Services is the tool of choice for the extract, transform, and load process (ETL); if you don’t know how to use it, load it with T-SQL instead. You can learn SSIS later.
When pitching the idea to your boss, show the model and demonstrate how he/she can perform their own analysis on it. Stress how much more convenient this is for the users, not how much time it will save you. Also, explain that this process can answer long standing business questions the boss may have. In my situation, Made2Manage cannot track sales order history. Through the ETL process of the data warehouse project, I can now track value changes in sales orders in near real time. This functionality saves my company money every day in that users no longer have to track those changes manually.
Learning on the Cheap
Books are one of the most cost effect ways to learn this. Below are a few books I recommend and why.
- The Data Warehouse Lifecycle ToolkitWritten by the Kimball group, this book describes the dimensional modeling process in depth as well as all phases of data warehouse creation. I’ve read it numerous times and still refer to it on a regular basis.
- The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2Also by the Kimball Group. This book covers the data warehouse process using the Microsoft Business Intelligence tools. It’s not released yet (available in March), but I’ve already pre-ordered my copy and I urge you to do so. I’ve read their previous book using SQL Server 2005and it’s excellent, so I have no problems recommending the new version.
- Microsoft SQL Server 2008 Integration Services: Problem, Design, Solution (Wrox Programmer to Programmer)When you’re ready to start learning SSIS, this book is an excellent resource with chapters specifically on loading the data warehouse.
There are also free webinars from time to time on the subject from the Kimball Group, as well as Pragmatic Works. Pragmatic offers excellent classes in the various Microsoft technologies involved, and I’ve taken nearly all of them. To my knowledge they don’t offer a class in dimensional modeling itself. Also, you can follow blogs like this one. In future articles I intend to document all phases of my open source project from modeling, to ETL, and then reporting on the data warehouse. It should serve as a good case study for others to learn from.
One last thing that I want to stress about learning these techniques. It is absolutely imperative that you learn by doing it. Create a Dimensional Model and attempt to load it. If you don’t have a data source, look for one. Volunteer with a charity, many of them would love to leverage your slave labor to better their “business.”
Get involved with your local SQL Server User Group and make connections. Your progress is only limited by your level of commitment.
So, will you be assimilated?