So, we’ve covered dimension tables and how they give your data meaning. That begs the question, how do we actually track the measures (value) of those business process events. In other words, where are we going to track the money, and some other measurements as well.
What are Fact Tables?
Fact Tables contain the measurements of your business process events usually in the form of money, quantities, and sometimes dates. The typical fact table contains nothing but numbers. Suppose we are designing a data warehouse for Acme Inc. maker of a variety of fine products, usually involving rockets, for sale to discerning coyotes everywhere. They have multiple stores throughout Arizona. Below is a sample fact table.
Whereas Dimension Tables tend to be denormalized or flattened, Fact Tables are highly normalized. For the most part, they consist of measurements and Keys that link to the surrogate keys of the Dimension Tables. A sample data set for the table above would look like this:
Fact Tables tend to be as “narrow” as possible because they are so much larger than the dimensions. The only keys which aren’t surrogates are the date table smart keys. Also, the measurements included in the Fact Table should be additive. Notice that I am not storing Unit Price. The user can calculate it from order quantity and extended price if need be. If you store a non-additive fact like unit price, users will try to analyze by it and get erroneous results.
Fact Table Grain
The most important consideration when designing Fact Tables is to set the grain. As Kimball states,
What, exactly, is the grain? The grain of a fact table is the business definition of the measurement event that creates a fact record. The grain is exclusively determined by the physical realities of the source of the data.
The grain should be at the most atomic level you can measure at because you never know how your users will want to slice and dice the data.
In my example, the grain is at the level of Sales Order Line Items. If we were a point of sale establishment like a grocery store, how would we define the grain? Paraphrasing Kimball, when the cashier scans your Cheetos and Bacon and you hear that “BEEP”, the “BEEP” is your grain of your fact table.
Do not violate the grain of your Fact Table, though you may be enticed to do so. For example, what if Acme wanted to add shipping costs to our Order Fact Table. Well, shipping costs are typically recorded per Sales Order, not per line item or individual product. We currently have this problem with M-Data Analytics.
You should not simply add a shipping cost value to your fact table. If a particular sales order cost $10 to ship and had 3 line items, how would you record that? You should not simply place $10 in each of the three fact table rows because it’s not additive. You shouldn’t include $10 for one of the lines and $0 for the others either. The only way to solve this is to estimate the cost per line item (perhaps by using weights in your product table) and use those values. That process could be prohibitively difficult however.
Fact Table Types
There are three types of Fact Tables.
Transactional A Transactional Fact Table is the fundamental type and the example above is Transactional. It should be at the most atomic grain possible, one row per transaction or business event. One row per “BEEP” as Kimball puts it. Since this table is at the most atomic level, they tend to have the greatest number of Dimension tables surrounding them.
A Periodic Snapshot represents a point in time representation of measurements. One common application is with measurements that cannot be historically tracked like current inventory or order backlog. Another is to summarize over periods of time, perhaps at end of month to track balances and such.
An Accumulating Snapshot Fact Table is typically a series of snapshots in a single row where each row represents a business process with a definite beginning and end. For example, let’s say that Acme Inc.’s order process typically includes these steps.
- Product is Quoted to Customer
- Quote is accepted or denied
- Sales Order is generated if Quote is accepted
- Sales Order is Shipped
- Customer is Invoiced
- Payment is Processed
If management wants to measure throughput of orders, how fast they move through the system, an Accumulating Snapshot can be used. The grain would represent a particular sales order (or sales order line item if they are ever handled separately). Included columns would be the important dates, foreign keys to the relevant dimensions, and the quantity and dollar amount measurements.
So, now that we’ve discussed the basics of Dimension and Fact Tables, I’ll bring it all together in tomorrow’s lesson. We’ll discuss Star and Snowflake Schemas, and how this all works. I’ll also cover the best ways to learn this process as well as how to convince your company to allow you to undertake building a data warehouse.