Archives

SQL University - Dimensions, It's All Meaningless Without Them

Note: This is the third in a series of five “lessons” which I created for SQL University on the topics of Dimensional Modeling and Data Warehousing. Click here to read the rest of the series.

You unlock this door with the key of imagination. Beyond it is another dimension – a dimension of sound, a dimension of sight, a dimension of mind. You’re moving into a land of both shadow and substance, of things and ideas. You’ve just crossed over into the Twilight Zone.

Before we begin with Dimension and Fact tables, I would like to recommend that you use the Kimball Toolkit Data Modeling Spreadsheet when designing your dimensional model. In that download, you’ll receive a blank copy, as well as a copy of one they created for Adventure Works. The Spreadsheet actually generates the T-SQL script to create your data warehouse database using your specifications.

What is a Dimension?

A dimension is an entity involved in your business process which gives your data meaning. Remember the old days when you were confounded by those story problems your sadistic Math teacher foisted on you? Don’t worry, I’m not going to ask you about a train moving west at 45 mph… Our story problem (data request) is simpler. “I want to know what the order amount was by fiscal year, customer, and product category.”

The important words are obviously what and by, which is why I bolded them. What generally tips you off to your Facts (measurements) and by indicates your dimensions. The dimensions are the items in your group by statement. The statement above could be written in T-SQL as:

Select sum(invsales)
where datefield.... 
group by Year, Customer, Product Category

Dimension tables are typically wide, verbose, and filled with descriptive detail fields called attributes. They should include as many details as is practical because you never know what a user will want to analyze by. Try to anticipate all required attributes up front, but add more as you go if need be. A Customer dimension for a business that sells to individual consumers. That dimension would typically include fields such as location, age, sex etc.

Normalization

You must unlearn what you have learned....


Dimension tables are denormalized which is difficult for some DBAs to accept. In an OLTP system the Customer entity attribute values may be divided among many tables such as address, credit information/rating, and demographics. During dimensional modeling we flatten those tables into as few tables as possible. Why do we do this? It’s easier for the user to understand and fewer joins (plus indexing) means much faster query response.

It is also important to be verbose in your descriptions. For example, in my current project there is an attribute called Product Class which is a two character code. Users have to memorize these numbers and what they represent because the source system separates the descriptions in another table. The descriptions should be used instead of the codes. Think of it from a new person’s point of view. Do you want to memorize that “01” means “Ceiling Fan” or would you rather just see “Ceiling Fan?”

Long time users of the system may actually resist this change, so in my current project I provide both values in my dimension table which provides the ultimate in flexibility.

Conformed Dimensions

Conformity Rules!


A dimension is conformed when it means the same thing for every fact table. Early on in the project the business must come to agreement of the definition of each dimension and what each attribute means. This can cause friction as well. For example, the Manufacturing Department of a company may have a different idea of Product Categories than the Sales Department.

Do NOT cave in. If you allow dimensions to have different meanings and attributes for different fact tables, you will lose the ability to drill across. This causes your different modules or data marts to be stove piped. Data from one may not be compared to another. This defeats the entire point of the data warehouse project.

Surrogate Keys

The Primary Key of a dimension table should be a Surrogate Key. Surrogate keys are unique meaningless integers generated by the database server. Typically, the user never sees these keys, they are simply used by the data warehouse to link tables together. You should not use your OLTP system keys instead even if they are integers. People often protest and want to use their OLTP values for these keys, but you shouldn’t because surrogates insulate your data warehouse from outside forces. For example:

  1. The OLTP source may recycle its keys. If they do so, the uniqueness of your dimension primary key is violated.
  2. Multiple sources for the same dimensional entity likely have different keys.
  3. If you aquire another company, which has it’s own set of keys, how could you incorporate them into your data warehouse without surrogate keys?
  4. Suppose one of your attributes is a ship date and that isn’t always known the first time a record is loaded? Surrogate keys allow a special value to be substituted instead like “Not Available” rather than a null.
  5. Surrogate Keys are compact which saves space, especially with indexes. To that end, you should be using the smallest integer datatype appropriate for these keys. Also, integer keys offer faster joins than string types.
  6. Surrogate Keys facilitates easier tracking of historical changes. More on this in a bit.

Unusual Dimension Types

Date Dimensions
Date Dimensions are unique in that they typically use meaningful keys instead of surrogates. I’ll cover Date Dimensions/Tables in depth in an article in the near future.

Degenerate Dimensions
Degenerate dimensions are actually a part of the fact table and are typically significant keys from your OLTP system like Sales Order Number or Customer Purchase Order Number.

Junk Dimensions
A Junk Dimension is a catch all dimension for miscellaneous data or record indicators. For example, attributes like Ship Early, Ship Method, Order Status, Pay Type or other flags can be put into a Junk Dimension. This keeps your design from being cluttered with too many dimensions.

Role Playing Dimensions
Role Playing Dimensions are specialized copies (typically views) of a Conformed Dimension. The most common use of these is with Date Dimensions. If you need to track Order Date, Acknowledgement Date, Ship Date, and Invoice Date you could do so using views which show different versions of the Date Dimension.

Tracking History With Slowly Changing Dimensions

Slowly Changing


There are three types of Slowly Changing Dimensions (SCD). Ralph Kimball, being the genius that he is, creatively named them type 1, 2, and 3.

  • Type 1 overwrites the attribute and ignores history.
  • Type 2 creates a new dimension record when attribute changes and marks the previous row as inactive.
  • Type 3 tracks the current attribute and the last previous attribute in separate columns.

It’s important to note that the attributes themselves are of each type, not the whole table. Combinations of SCD Types can be implemented as well. The easiest way to understand this is with an example. Let’s say that we have a very simple customer dimension like this:

Incidentally, the CustomerAK represents the Alternate Key or Business Key, in this case it’s the customer number from the OLTP source system. Let’s say that the Customer attribute Sex is a type one because if it is changes it is usually due to an order entry error. So, in the next extract, the value has been corrected for Jane Doe. The table would look like this:

Notice that Jane Doe did not get a new row, that record was updated in place. Now, let’s assume that City is a type 2 dimension and that I moved to Jacksonville. Why would I move to Jacksonville? Well, a certain well known BI consulting company is headquartered near there… Hint Hint. Anyway, after processing the the extract, the table would look like this:

So, I’ve been given a new surrogate key and my old record has been expired. Any new fact records after that time use the record currently active. Notice that we are ignoring time of day, and as long as the effective and end dates don’t overlap, I can use a simple T-SQL Between statement to easily query the data.

In a future article, I’ll cover in detail how to accomplish the slowly changing dimension processing using SQL Server Integration Services.

SCD Decisions

The next natural question is “who decides which columns are which of which SCD type?” Well, if you read yesterday’s article you can already guess the answer. The business folks decide which attributes are SCD 2. However, you must protect them from themselves. Typically if you ask them what attributes they’d like to keep history on, they’ll invariably answer “Everything.” This may not be feasible because if you add a new record for every single dimension change your data warehouse size may grow at a ridiculous rate. Inform them how much all of that history will cost them in hardware. It’s important to note that certain regulatory rules may force all changes to be type 2 regardless of cost.

Tomorrow, we’ll discuss the Facts. “Just the facts, ma’am.” Yeah, I know the quote was lame, but I couldn’t help myself.

If We Build It, Will They Come?

Note: This is the second in a series of five “lessons” which I created for SQL University on the topics of Dimensional Modeling and Data Warehousing. Click here to read the rest of the series.

Creating a data warehouse is a daunting process. It’s not something you just whip out in a weekend. So, let’s get a few things straight to start with.

Who Designs the Data Warehouse (Dimensional Model)?

If you build it, will they come? It seems obvious that you design the data warehouse, that’s why you’re taking your time read this blog, or maybe you’re just fascinated with my movie quotes. But seriously, who designs this thing? The business users.

The business users have a large part in designing a data warehouse. This isn’t an IT process, its a business process. The subjects of the data warehouse are decided by the business, as well as what information needs to be kept on each. In reality the data warehouse exists to assist users in making better decisions. The data warehouse architect is there to assist and guide in this process. One sure way to fail is to have IT department centered data warehouse because the users won’t see value in it.

Notice that I didn’t suggest that you should ask the business users what they want. If you do so, you are handing over the design role to the business user. They don’t know what they want because they don’t know what you provide. They will tell you something, but because they don’t understand what you can do, it will be wrong. It’s up to you to bridge the gap and learn the business to so you can design a successful dimensional model.

So, the business users don’t understand tables, keys, referential integrity, etc. Part of the data architect’s job is to educate the users about what can be done so they can provide better input into the process. So, how can you teach them the basics of fact and dimension tables? The same way I’m going to teach you.

Let’s start out with pivot tables, which most users understand. Ask the user for a typical reporting requirement that they might have. Suppose they say, “I want to know incoming order amounts by fiscal year, period, and product category.” Well, the most users understand how to use Excel pivot tables so that’s the method we’ll use.

Open Excel, query the data, and create a pivot table. The user may already know that they need to drag the amount to the Values box, and already know how to pull the values for year, month, and product category to the columns and rows as shown. The values, or the column you want to measure, are your facts. The things you measure facts by (by fiscal year, period, and product category) are your dimensions. Dimensions and facts go in separate tables, but the users may never see them. Now that they understand that, what’s next?

How do we model the model?

bus_matrix_example

Do you want to know what the Matrix is Neo? – Morpheus The Matrix

Start with the Kimball Bus Matrix, which is kind of like a high level road map of your project. Along the left hand side are the business processes which bring value to your organization. The columns are the descriptive objects (dimensions) for your organization. If a dimension is involved with a specific business process, place an X in the intersecting cell. Notice that I have added a column called “Business Priority” which I use to indicate the order in which I will handle each process.

Typically the Bus Matrix is much larger because enterprises typically have more business processes. The example presented is a simplified version demo purposes.

How do we determine which order to tackle the business processes?

Well, the first business process takes the longest. You’re going to find that dealing with the dimensions typically takes a lot longer than fact tables. When you work on your first business process, you will need to create all of the dimensions associated with it. In my example above, if we start with Incoming Orders, every dimension but Vendors has to be created. However, each additional business process becomes easier because we re-use the already existing dimensions.
business_process_order
So, which business process do you start with? Start with the low hanging fruit. You want to pick a process which is relatively easy and has high business value. Perhaps that value represents money savings, increasing sales opportunities, or something that is a mandated compliance issue. Once again, all of this is considered with a focus on the business. We want a quick return on investment so we continue to have the support of management.

So, you’ve identified the business process and the facts and dimensions associated with it. Note that we haven’t started fleshing out the dimension fields with the descriptive data. That’s what we’ll handle next when I show you the basics of modeling dimensions.

SQL University - Dimensional Modeling and Why It Doesn't Suck

This is the first in a series of five “lessons” which I created for SQL University on the topics of Dimensional Modeling and Data Warehousing. The rest of the lessons are as follows:

SQL University

Welcome to those of you from SQL University attending this series of “classes” on Dimensional Modeling Fundamentals. If you are new to my blog, you may want to take a moment to read about me and why I blog.

This series of articles will serve as an introduction to Dimensional Modeling and Data Warehouses. While I am a SQL Server Business Intelligence DBA, the fundamentals of dimensional modeling are universal and platform agnostic. I’m a big proponent of Kimball Methodology, and most of what I will share with you is based on it.

What is Dimensional Modeling and what is a Data Warehouse?

Dimensional Modeling is a term for a set of techniques and concepts used to design data warehouses. Different people have their own definitions of the term data warehouse, but the one we’ll use is simple. A Data Warehouse is a relational database that is optimized for reporting. Typically they contain information from many different data sources and are loaded from those sources on a periodic basis.

Why do I need a Data Warehouse?

What would you do with a brain if you had one? – Dorothy. The Wizard of Oz

There are many reasons why you need a data warehouse. Let’s cover a few.

Your Database Design Sucks

Well, it sucks for reporting anyway. How can I make that blanket statement? Well, when most of us think of a database, we think of an On-Line Transaction Processing (OLTP) database whether we know it or not. This type of database is optimized for data entry: inserts, updates, and deletes. Part of this optimization is due to normalization. I won’t go into the gory details, but normalized databases are generally composed of narrow tables of closely related data, which link to other related tables to minimize redundancy. This results in a logical layout for the data, reduces the possibility of multiple values for the same attribute, and reduces database size.

For example, let’s assume you were designing the sales order tables of a database for manufacturing companies (like Made2Manage). You could not have all of your sales information in a single table. Logically, at the very minimum, you would require the following tables. I’ve listed some common fields at each level.

  1. Customer Table (SLCDPM), which would contain customer name, discount, credit rating, billing address.
  2. Sales Order Header (SOMAST)- Purchase Order Number, Order Date, Shipping Method.
  3. Sales Order Detail (SOITEM)- Quantity Ordered, Unit Price, Part Number.

Of course, I’m simplifying things. Even in a small database like Made2Manage the sales order module encompases more like 10 tables as you can see from the simplified model below.

One of the tables is in red because its a Visual FoxPro table. I have seen production systems which have as many as 15 – 20 tables involved in the sales process. While this may make sense in a transactional system, reporting from it can be nightmarish.

A dimensional model for a data warehouse is designed differently. We transform the standard OLTP structure into a simplified structure with two basic types of tables. Fact tables generally contain measurements such as dollar amounts, and dimensions which contain descriptive facts about them. I’ll go into this in more detail later this week, but in the dimensional model tables are combined and “flattened” out which makes reporting easier for the users and faster for the optimizer.

After being dimensionalized, the tables look similar to the following:

  • Lacking Primary and Foreign Key Relationships – That’s the reason the first illustration lacks the arrows. Made2Manage lacks primary and foreign keys. Therefore, most query programs will not pick up the relationships for the users automatically.
  • Incomplete Normalization – Your database may repeat the customer name in both the customer master and sales order header tables. This leads to situations where the customer has changed names, but the sales orders reflect the previous name.
  • “Bad” Field Names – While developers may understand that Fsono stands for sales order number, the users should not have to see that. Some databases have field names that are in foreign languages, or use field names like F111, F143, etc. Notice that in the second illustration, the field names are typically in real English.
  • Indexing issues – As mentioned before, OLTP databases are optimized for data manipulation and indexes tend to slow those processes. However, when retrieving millions of records from 15 joined tables and constraining by multiple values, indexing is critical for performance.
  • Hard Updates and Deletes / Auditing – Some OLTP databases update records in place which makes it difficult to track changes. They may also “hard delete” as well, meaning that they delete the record from the table rather than marking it with some flag indicating that it has been deleted. While the data might be accurate as of now, it becomes difficult to track history which causes auditors to have fits. Often auditing functionality can be built into the extract, transform, and load (ETL) process for loading a data warehouse.
  • Improper Data Types – Databases may save data in the wrong data type. For example, they use the Varchar or Char data types for dates, numbers, etc. While the data is saved, this can make data manipulation and reporting difficult.

Even if you are fortunate enough to have a well designed OLTP system, it is most likely not a good reporting platform.

Your Data Sucks

The consitency and completeness of data required in a data warehouse is much greater than typical OLTP databases. These issues are typically corrected during the extract, transform, and load (ETL) process. For example, in a typical OLTP source database, you may run into any of the following:

  • Blank or Null Values. If an address is lacking a country designation, the package will still likely be delivered. However, when reporting blank values make analysis difficult.
  • Inconsistent or mis-spelled values. Made2Manage doesn’t have any restrictions on its State fields. This allows the users to enter things like “IT” for Italy. Also, my database contains multiple versions of the same state name. The great state of California is represented as CA, Cali, California, Califrnia, etc. This incredibly frustrating for reporting.
  • Duplicate Entities. Many database systems allow users to accidentally enter duplicate entities like customers, contacts, etc. Reports that group by these entities are often inaccurate because of this.

User Extracts Suck

Rest assured that if you aren’t extracting your data for analysis, your users are. They’re exporting reports from the source systems, using Excel to pull the data, etc. They’re most likely doing so with inefficient queries, and at inopportune times. Their interpretations of that data may be inaccurate and they often litter your servers with multiple copies that are at varying stages of obsolescence.

Multiple Data Sources Suck

Most larger companies have numerous legacy systems, which don’t “talk” to each other. Different systems for Customer Relationship Management (CRM), Shipping, Manufacturing, Accounting, Payroll, etc. can all be merged into one well designed data warehouse for analysis.

Another issue arises when you need to change your source OLTP system. Suppose you’re switching your ERP system product. Well, you’ve already spent forever creating dashboards and reports on that system and now you’ll have to do it all over again. Source systems may change, but a properly designed data warehouse doesn’t necessarily have to. If the business itself doesn’t fundamentally change, then the data warehouse structure remains effective. You can simply change your ETL process to load your current structure or add additional functionality to your current data warehouse.

Date Grouping and Manipulation Sucks

Without the use of a date table/dimension date grouping is a royal pain, especially if you use non-standard fiscal periods. My current employer uses 4-4-5 fiscal periods. Therefore, when grouping by fiscal period, I simply have to use a date table. Data warehouses almost always have a date dimension.

The Report Process Sucks

First, let me tell you what the majority of your business users think about reports. When they want a report, they send a request to IT. It sits in the queue for weeks or perhaps longer. Eventually the DBA asks them a ton of questions which the business user thinks should be a matter of common sense. After several drafts, the approval is finally made and the user forgets why they requested the report in the first place. Reports evoke visions of green bar paper and dust. Wouldn’t it be nice for them to open Excel and simply drag fields into a pivot table or fire PowerPivot up and create an interactive application? They want analytics, not reports.

And what about you, the DBA or Report Writer? How many times can you write the same report with slightly different groupings and parameters? It’s a never ending treadmill and that’s one of the reasons I pursued this path in my free time.

Data Warehousing Doesn’t Suck


A few years ago I attended a SQL Server Analysis Services class with the great Mike Hotek, and he started out with dimensional modeling. He spent less than 20 minutes discussing it, briefly describing dimension and fact tables. However, when he did the clouds parted and a beam of light shown through, and I heard angels singing “aaaah”.

It’s a UNIX system! I know this! – Lex. Jurassic Park

Dimensional modeling just clicked with me and I instantly “got it.” I proceeded to purchase nearly Kimball Group book on the subject and have been studying it ever since. Data warehousing is fascinating to me as it gets me out of the daily grind, and it’s good for the business as well. Its like getting paid to work on puzzles all day long. In short, its full of win.

Tune in every day this week, and I’ll show you why.

Upcoming Presentation and Blog Events

I mentioned in the last article that I am taking a temporary break from the T-SQL Basics Series of articles. I’ll return to it in a few weeks, but in the meantime I’ve been working other stuff for your amusement.

Next week Wednesday, I’ll be presenting at the Fort Worth SQL Server User Group. Come see me present:

Zombie Database Survival Guide: Implementing PowerPivot with Zombie Databases

Do you have a Zombie Database? Does your database suffer from roaming hordes of problems like the lack of primary and foreign keys, poor normalization, in place updates and deletes, and an insatiable drive to eat your braaaaaaaaaaaaaains (CPU cycles).

Fear not fellow survivor, for I’ll show you how to control the zombie apocalypse and prepare your data for use with PowerPivot, a free add on for Excel 2010. I’ll show you how to use PowerPivot to slice and dice your data and keep your users safe from the zombie databases which threaten their very way of life.

A great time will be had by all.

SQL University

My good friend Jorge Segarra (Blog/Twitter) aka SQLChicken has asked me to become a professor at SQL University. Next week I’ll present a series of articles on Dimensional Modeling Fundamentals.

Tune in and you learn the following:

  • Why your database sucks, and what you can do about it.
  • The reasons you actually have to listen to business folks.
  • What’s with all the dimensional modeling techno-babble? What does it really mean?
  • How to break a Data Warehouse project into digestible chunks so you don’t get overwhelmed from the start.
  • What you really need to know about Dimension and Fact Tables.

There will be chills, thrills, and most likely spills; charts, pretty pictures, dancing girls…. Well, maybe not dancing girls, but I’ll share my views on Dimensional Modeling.

So, attend my presentation in Forth Worth if you can, and tune in next week to see if I can’t get myself disbarred from SQL University.

T-SQL Basics: Different Types of Joins

In this seventh article of the T-SQL Basics Series, we will be covering both the basics as well as some specific issues with SQL Joins. If you’ll recall our last lesson, we wrote a sales query for Master Po. He then requested that we add customer information to the report.

Note, after receiving e-mails from non-M2M users complaining that they couldn’t follow along, I’ve decided to switch to using a database that everyone can use. So, I’ve created a quick database for our Kung Fu Temple Gift Shop. If you want to follow along, please run the following code on a test server: Kung Fu Database Script. It will create a small and imperfect database for us to use in this series.

Prerequisite Reading

Before we begin this lesson, I ask that you read an ancient manuscript by Master Jeff Atwood (Blog/Twitter). Yes, I know he belongs to the developer’s temple, but he’s already written the excellent primer, A Visual Explanation of SQL Joins. Please read the text and come back when you’re finished so I can fill in the details. Go ahead Grasshopper, I’ll wait.

Additional Wisdom about Joins

I love the primer because of his use of Venn Diagrams. The first time I worked with SQL Joins, memories of middle school math lessons about Venn Diagrams popped into my mind. While nothing he wrote is incorrect, I wish to add to it and flesh out your knowledge of SQL Joins.

Let me give you a brief run down of the tables in the Kung Fu Temple Gift Shop Database, and how they are related. The tables we will be using are listed below. I’ve listed the Primary Keys, but have not yet set up Foreign Keys relationships. We’ll discuss those in a later lesson.

Table NamePrimary Key
CustomersCustomerID
ProductsProductID
OrderMasterOrderID
OrderDetailsOrderID, LineNo

Inner Join

The Inner Join is most commonly used and is in fact the default join type. This type of join only returns records which match from both tables. There are two basic ways to write it. Let’s link the Order Master and Order Items tables and return a few fields from each.

Select
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
From
  dbo.OrderMaster OM
 ,dbo.OrderDetails OD
where
  OM.OrderID = OD.OrderID

Does that code look strange to you? Well, if you’ve been following my writings, it should. Some of the M2M Reports, which I’ll cover in a later lesson, use this old syntax and there is technically nothing wrong with writing it. The optimizer processes it the same way as the syntax I will cover in a minute and it runs just as fast. However, Master Po has forbidden it as an affront to the laws of nature, so you will never see me use it again. Why? I’m glad you asked.

  1. It’s hard to read. This example looks simple because it only includes two tables. However, when more tables are involved its just too easy to forget one of the Where clause conditions. Therefore, using this joining method is…..
  2. Dangerous. If you forget to include a table join in the Where clause, you will create a cartesian product with that join. A cartesian product returns every combination of each table where the join is either missing or invalid. For example, in our Kung Fu Gift Shop database there are 4 records in Order Master and 5 in Order Items, if we neglect to add the Where clause, the recordset jumps to 20 (4 * 5) records. In a production database with millions of rows in each table, the result set becomes ridiculously large. Also, every additional join you miss exponentially increases your recordset.

For these reasons we will explicitly the Join this way:

SELECT
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
FROM
  dbo.OrderMaster OM INNER JOIN dbo.OrderDetails OD ON OM.OrderID = OD.OrderID

In T-SQL the use of the word Inner is optional as well. I personally add it, but many don’t.

SELECT
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
FROM
  dbo.OrderMaster OM JOIN dbo.OrderDetails OD ON OM.OrderID = OD.OrderID

Please note that you can mix both styles in the same query, but I don’t recommend it for the same reasons.

SELECT
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
FROM
  dbo.Products PRO, dbo.OrderMaster OM -- Do Not do this. 
  JOIN
    dbo.OrderDetails OD
  ON OM.OrderID = OD.OrderID
WHERE
  PRO.ProductID = OD.ProductID -- Join between Products and Order Details.

Outer Joins – Left, Right, and Full

Left Outer Join

The Left Outer Join returns all of the records (not filtered by a where clause) from the field to the left of the Join and then any corresponding records from the table to the right of the Join. The records from the first (left) table are returned even if there are no corresponding records in the other table. As with other Joins the word Outer is optional as well. Left Outer Join works the same as Left Join.

For example, lets say that Master Po wants a list of all of the Kung Fu Temple Gift Shop’s customers and any of the orders they have placed. You could write this query two ways. First, let’s use a Left Outer Join.
Option 1 Left Outer Join

SELECT
  CUST.CustomerID
 ,CUST.Title
 ,CUST.CustFirstName
 ,CUST.CustLastName
 ,OM.OrderID
 ,OM.OrderDate
 ,OM.DueDate
 ,OM.PONo
FROM
  dbo.Customers CUST
  Left JOIN dbo.OrderMaster OM
    ON CUST.CustomerID = OM.CustomerID

Take a look at the first couple rows of the result set for this query. 5 Total Records were returned.

My name is listed twice because I placed 2 orders. Also, notice the Null values for Master Kan? The values for Order Master fields is null because there is a customer record for him but he hasn’t placed any sales orders.

Let’s look at the same query with an Inner Join. 61 Total Records were returned, one less than the previous query because we only have 1 customer which hasn’t placed orders.
Option 2 – Inner Join

SELECT
  CUST.CustomerID
 ,CUST.Title
 ,CUST.CustFirstName
 ,CUST.CustLastName
 ,OM.OrderID
 ,OM.OrderDate
 ,OM.DueDate
 ,OM.PONo
FROM
    dbo.Customers CUST
  INNER JOIN
    dbo.OrderMaster OM
  ON CUST.CustomerID = OM.CustomerID

When run, the record for Master Kan is removed. So, which is right? Well, it depends on what Master Po meant. Does he want to include customers which have no orders? Option 1 includes all customers whereas option 2 omits customers without orders. I used this particular example to emphasize that you need to learn to ask the right questions in order to get the right answers. In this case, we’ll stick with the Left Join.

What if Master Po wants a list of only those customers who never placed an order? We can achieve this using a Left Join and adding “WHERE OM.OrderID IS NULL”. In this case, only one record shows. Do you remember seeing that type of join before? I used it to find orphan records in your M2M Database. M2M Support still hasn’t forgiven me for posting that as so many of you called them asking them to fix your database. I’ll cover more about Left Join usage and problems with Nulls in a future lesson.

Right Outer Join

Take everything I just said for Left Outer Join and reverse it. Period. In fact, I have NEVER written a Right Outer Join. There isn’t a reason that why one would need to use one. There’s a rumor that persists that Left Joins are faster than Right Joins, but that isn’t true either. I only use Left Joins for consistency sake, and I suggest you do as well.

Full Outer Join

A Full Outer Join is returns all rows from both tables regardless of whether there is a match. In a way, it combines a Left and Right Outer Join. It’s not all that commonly used. The most common instance in which I use it is to compare the results of two queries. For example, let’s compare simplified versions the Inner Join and Outer Join queries I listed previously.

SELECT First.CustomerID FirstCustID,
       First.OrderID FirstOrderId,
       Second.CustomerID SecondCustID,
       Second.OrderID SecondorderID
  FROM    (SELECT CUST.CustomerID, OM.OrderID
             FROM    dbo.Customers CUST
                  INNER JOIN
                     dbo.OrderMaster OM
                  ON CUST.CustomerID = OM.CustomerID) First
       FULL OUTER JOIN
          (SELECT CUST.CustomerID, OM.OrderID
             FROM    dbo.Customers CUST
                  LEFT JOIN
                     dbo.OrderMaster OM
                  ON CUST.CustomerID = OM.CustomerID) Second
       ON First.OrderID = Second.OrderID
 WHERE    First.CustomerID <> Second.CustomerID
       OR Second.CustomerID IS NULL
       OR First.CustomerID IS NULL


I know it looks complicated, and the results are underwhelming, but we’re working with a small, simplistic data set. Basically I took the first query and creatively aliased it as “First”, the second as “Second”, put them together with a Full Outer Join, and then looked for any differences in their CustomerID fields. Don’t be thrown off by the “Is Null” statement as we’ll be covering Nulls in detail in a future article.

Another real world example where I use Full Outer Joins is if I’m changing report logic and need to prove that the results are the same. I’ll join the results of the first query, with my new query, and look for differences. There are also some 3rd party products which do that kind of comparison automatically.

Non Equi Joins

All of the examples we’ve used so far are Equi Joins because the join conditions have always been equal. Table A Inner Join Table B on Field A = Field B. Non Equi Joins are not equal, greater than, less than, etc. One common use for Non Equi Joins is joining dates to a table of fiscal periods. I’ll cover dates and calendar tables in detail later, but let’s assume that we had a table with a list of fiscal periods with a begin and end date for each period (Like M2M’s GLRULE table). In psudo-code you would join them like this:

SELECT *
  FROM    dbo.OrderMaster OM
       INNER JOIN
          FiscalPeriods FIS
       ON OM.OrderDate >= Fis.BeginDate AND OM.OrderDate < Fis.BeginDate

SQL Statements in Review

So far in this series, we have discussed the basics of T-SQL Syntax, importance of coding consistently, elements of the Where clause, and summarizing and grouping data. Let’s take a moment to review the order of statements in a SQL clause as a way of closing this series of lessons in the T-SQL Basics Series.

  1. Select – followed by a list of fields, functions, etc. and should be properly aliased.
  2. From – followed by the tables to be used in the query and how they are to be joined.
  3. Where – followed by the conditions by which individual records are to be filtered out.
  4. Group By – followed by fields by which records are to be grouped. Remember that order is important.
  5. Having – followed by the conditions by which the groups are to be filtered out.
  6. Order By – followed by the fields by which the final result set is to be ordered.

This is not the end of our lessons in T-SQL. In future installments I’ll discuss in more depth topics like sub-queries, the challenges associated with Nulls, common functions and their usage, working with dates and date tables, as many others. Patience Grasshopper, we’ll get there.

What other topics would you like me to cover?

T-SQL Basics - Ordering, Summarizing, and Grouping Data the Kung Fu Way

Today Grasshopper, I will teach you the basics of ordering, summarizing, and grouping your data to give it meaning. Let’s say that the Kung Fu Temple Gift Shop has had poor sales. Master Po wants a list of sales details from you, but he wants them ordered by the part number.

Note for the non Made2Manage people, I am using the M2M Educational Database. Sorels is the Sales Order Release table. I’m using it because it is the sales table which has dollar values. I’ve attempted to document the code with comments so students from other temples can follow the lesson.

Also, all of the screenshots are the data grids produced by Toad for SQL Server which recently released a new beta. Anyone can use the full beta until March 3rd.

Order By

The Order By clause does just what the name suggests, orders records. The Default order is ascending (ASC), but you can also return them in descending (DESC) order as well.

SELECT
  SOR.fsono                             SalesNo
 ,SOR.fenumber                          ItemNo
 ,SOR.frelease                          Release
 ,SOR.fpartno                           PartNo
 ,(SOR.forderqty * SOR.funetprice)      NetAmount -- Order Quantity * Net Unit Price 
FROM
  dbo.sorels SOR
ORDER BY
  SOR.partno -- ASC is default and optional, DESC needs to be specified.

Master Po now decides that he wants the records sorted in descending (DESC) order and then sorted by part revision. Note that I’ve used the aliased field names as well, more on that later.

SELECT
  SOR.fsono                             SalesNo
 ,SOR.fenumber                          ItemNo
 ,SOR.frelease                          Release
 ,SOR.fpartno                           PartNo
 ,SOR.fpartrev                          PartRev
 ,(SOR.forderqty * SOR.funetprice)      NetAmount -- Order Quantity * Net Unit Price 
FROM
  dbo.sorels SOR
ORDER BY
  PartNo Desc, PartRev

Master Po likes the report, but he would like to have the details summarized for him. How do we accomplish this?

Using Aggregate Functions

The most commonly used aggregate functions are listed below. The complete list and definitions can be found here.

  • Sum(SomeField)
  • Avg(SomeField)
  • Count(SomeField)
  • Max(SomeField)
  • Min(SomeField)

So Grasshopper, rather than tediously explaining what you can already guess, let’s look at the following examples.

SELECT
  sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount -- distinct removes duplicates
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR


This works, but Master Po is not happy. The data has little meaning because it doesn’t differentiate by Part Number, Customer, Date, or any other useful attributes. So, how can you summarize the data by part number for him? Well, you might think that you could do this:

SELECT SOR.fpartno                      PartNumber
 ,sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR

However, SQL Server will return an error like “Column ‘dbo.sorels.fpartno’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”

Using Group By

Grouping returns a single row for each unique value in the selected column with the corresponding aggregate value. To appease Master Po, let’s add a Group By clause to the above query.

SELECT SOR.fpartno                      PartNumber
 ,sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR
  Group By SOR.fpartno


So, what have we returned? Well, now the information has more meaning. Look at it again, after I’ve adjusted the alias names.

We can now see important patterns in the data. For example, why is there such a large discrepancy between the minimum and maximum unit price for part number CP2010 and CP0500? Could it be that we’ve been swindled by Ninjas? Group by can also be used for more than one column. In the following example, we are grouping first by the Part Number and then by the Sales Order Number.

SELECT SOR.fpartno                      PartNumber
 ,SOR.fsono                             SalesNo
 ,sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR
    Group By SOR.fpartno, SOR.fsono
    order by SOR.fpartno, SOR.fsono


Granted, the results are not earth shattering because I have a very limited data set. Also, since we have not joined to other tables, I can’t group by less specific attributes like Customer or Product Class, but you get the idea.

Let’s say that Master Po wants to know the total quantity due to ship per month per part number in the year 2010. In the future, I’ll show you a much easier way to do this using a date table, for now, we’ll use the following code.

SELECT
  SOR.fpartno           PartNumber
 ,month(SOR.fduedate)   MonthDue
 ,Sum(forderqty)        OrderQty
FROM
  dbo.sorels SOR
WHERE
  SOR.fduedate >= '1/1/2010' AND
  SOR.fduedate < '1/1/2011'
GROUP BY
  SOR.fpartno
 ,month(SOR.fduedate)
ORDER BY
  SOR.fpartno
 ,month(SOR.fduedate)


The Month function returns the month number of a specific date as a number. Master Po can now scan the list, or better yet you could prepare a good looking report for him using SQL Server Reporting Services (SSRS). Using it he’ll know when certain parts are due and can schedule his monk’s time accordingly. Before any of you geeks out there call me on this, yes I know Master Po was blind, but he was also magical and I’m sure he could “read” SSRS reports using those abilities.

Master Po is a demanding Sifu, he must be considering what Caine had to do with his forearms before leaving the temple, and suspects you have some duplicate records in the sorels table because your figures seem a little too high. Bitter experience has shown Master Po that M2M was designed by the Drunken Master and lacks primary and foreign keys. How can we verify that we do not have duplicates in sorels and be allowed to leave the temple?

The Having Clause

If Master Po had designed M2M the sorels would have a primary key consisting of Sales Order Number, Sales Order Line Item, and Sales Order Release. You can learn more about Primary Keys and Referential Integrity problems in M2M here.

Having is nearly identical to the Where clause. Having filters groups in the same way that Where filters records. Since we want to know if we have duplicates, we’ll group by the columns which should be unique in each record.

SELECT   count(*)
FROM     dbo.sorels SOR
GROUP BY SOR.fsono, SOR.finumber, SOR.frelease
HAVING   count(*) > 1

Zero records are returned, meaning you have no duplicates. After telling Master Po, he wants another variation of the report. He wants to know the total dollar amount per sales order, but only wants to see those with a total more than $10,000. Yes, I know he would want it in Yen but my data set is limited to USD so just play along.

So, how would we do this?

Select SOR.fsono
,sum(SOR.forderqty * SOR.funetprice) TotalPerOrder
From dbo.sorels SOR
Group by SOR.fsono
Having sum(SOR.forderqty * SOR.funetprice) > 10000
Order by sum(SOR.forderqty * SOR.funetprice) desc

To impress Master Po further, we ordered them from highest to lowest dollar amount.

Field Name Alias Usage

Notice that the aliased field name cannot be used in the Group By or Where clause , but they are permissable in the Order By clause.

 Group by PartNumber -- not legal
Order by PartNumber --Legal

Why? The reason field aliases cannot be used in these clauses statements is due to the order in which SQL Server evaluates statements. Basically, SQL Server evaluates the statement in the following steps.

  1. The From clause including the Joins (more on Joins later).
  2. The Where clause filters out any unnecessary records.
  3. The Group By is then applied.
  4. Having filters out unneeded groups.
  5. The Select is evaluated including the aliased field names assuming they’ve been assigned.
  6. The Order By then sorts the records.

Since the Select portion of the SQL statement is evaluated after the Where, Group By, and Having clauses, none of them can use aliased field names. SQL Server really has no idea that sor.fpartno is PartNumber at that point. The table alias can be used throughout because it is assigned in the first step. Make Sense?

The Demanding Master Po

Anyway, back to our story. You’ve accomplished what Master Po asked, verified that your table has no duplicates, and you are sure he will be pleased. However, when you bring it to him, he smiles and says, “Very good Grasshopper… but which customers placed these orders? Oh, and be sure to practice your pebble snatching as well.

Well Grasshoppers, I’ll show you how to get the customer information in the next lesson. As far as snatching pebbles, you’re on your own.

Does anyone else have pebble snatching tips to share?

Ooh! Ooh! Pick Me!

horshack

Pick Me Mr. Kotter!

SQL Rally is a brand new event for this year. It’s kind of a hybrid between the PASS Summit and a SQL Saturday. It’s fairly low cost and being held May 11-13 in Orlando, Florida.

Anyway, the reason for this post is completely self serving. I really want to present at this event. Please register to vote and vote for my sessions. Mine are in the 10th section in the SSRS category. The abstracts I submitted are:

Title: Choose Your Own (Reporting) Adventure – SSRS Survivor Series
Abstract: Create a story of guided analytics using Reporting Services drill down, drill through, and expressions. Sure, you may be using drill down already, but I’ll show you how to make the most of Reporting Services interactions, and more importantly how to avoid becoming dragon chow at the end.

Title: The Power of Parameters – SSRS Survivor Series
Abstract: Want to know the best techniques to help your user narrow their search in the most efficient ways possible? Come watch me demonstrate which parameter techniques are best for certain situations and field types. Real world problems and applications are stressed, and I’ll demonstrate why some commonly accepted methods simply do not work and should be avoided.

–My name is Made2Mentor, and I approved this message.

T-SQL Basics - Filtering Records with the Where Clause

All of the queries we’ve looked at so far will return all of the records in a table. This isn’t very efficient. The Where clause filters records out of your result set that you do not want.

For example, suppose we’re only interested in Sales Order Master records for “Acme Rockets Inc.” The following select statement will work:

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SalesNo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM dbo.somast SOM
  where SOM.fcompany = 'Acme Rockets Inc.'

Notice that I have used single quotes around the company name, they are required for character values, but not numerics. Please note that many fields that M2M refers to as numbers are actually not numeric fields. Most record number fields such as Sales Order Number, Purchase Order Number, and Invoice Number are 6 digit character fields even though they may contain only numeric values. Therefore, this is how to write the statement if you only want to return information for Sales Order “Number” 000001.

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SalesNo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM dbo.somast SOM
  where SOM.fsono = '000001'

And/Or Operators

What if we took the above example and wanted to retrieve 2 sales order records? An Or statement will return as true if either expression is true. Therefore, an Or statement like this will retrieve both records.

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SalesNo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM dbo.somast SOM
  where SOM.fsono = '000001'
  or SOM.fsono = '000002'

Let’s say you wanted to check for records for Acme Rockets Inc. with a particular Sales Order Number. The And statement will only evaluate to true if both conditions are true.

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SalesNo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM dbo.somast SOM
  where SOM.fsono = '000001'
  and SOM.fcompany =  'Acme Rockets Inc.'

I recorded a quick video to explain the gotchas of using And with Or in a Where clause.

Parenthesis Use

As I mentioned in the video, parenthesis can be used to modify expression evaluation in a Where clause. Suppose you wanted to run the previous query for two separate sales order numbers. In English we are looking for any Acme orders with an order number of ‘000001’ or ‘000002’ that are for Acme Rockets Inc.

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SalesNo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM dbo.somast SOM
  where (SOM.fsono = '000001' or SOM.fsono = '000002') -- evaluated as a group
  and SOM.fcompany =  'Acme Rockets Inc.'

Greater Than, Less Than, Not Equal To

Where clauses are not limited to equal conditions. Not Equal To (<> , !=), Greater Than or Equal To (>=) Less Than or Equal To (<=) and simple Greater Than or Less Than are all supported. So, the previous query could also be written:

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SalesNo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM
  dbo.somast SOM
where SOM.fsono <= '000002'
and SOM.fcompany =  'Acme Rockets Inc.'

It’s important to note that since somast.fsono is a character field the less than or equal to is treated a like an alphabetical lookup rather than a numeric one. Typically this isn’t a problem because the numbers consistently include preceding zeros, but it is something to keep in mind.

The Between Statement

Suppose we wanted to return all of the Sales Order Numbers between ‘000001’ and ‘000100’. Well, we’d use the Between statement of course.

Where fsono between '000001' and '000100'

Keep in mind that the statement is inclusive meaning that the first and last value are also returned. If you didn’t want to include ‘000100’ you would need to edit the last value or use a less than instead. The most common use of between is with date comparison and there are gotchas with that. I’ll explain date manipulation and comparison in a later blog article.

The IN Operator

Let’s say that you want retrieve Sales Orders from several of Acme’s subsidiaries. In this example, you want to extract records for “Acme Rockets Inc.” and “Acme Rocket Skates” as well. You could use the following Where clause:

Where SOM.fcompany = 'Acme Rockets Inc.' or SOM.fcompany = 'Acme Rocket Skates'

However, suppose Acme has 20 subsidiary companies all with some version of Acme in the name. Using Or would be unwieldy. Therefore, you can use the In statement like this:

WHERE
  SOM.fcompany IN
    ('Acme Rockets Inc.'
    ,'Acme Rocket Skates'
    ,'The Acme Company'
    ,'Acme Inc'
    ,'A Company Making Everything'
    ,'Acme'
    ,'Acme Rackets Inc.'
    ,'Acme Rickets Inc.')
    -- etc.

While this is better, listing 20 companies this way would still be a pain. So, for this case, we’re going to use the Like statement.

Using Like and Pattern Matching

The most commonly used wild card character is %. % matches any string of characters or no characters at all. For the sake of argument, let’s say that all of the variations of Acme actually started with the word Acme. They could be returned like this:

WHERE
  SOM.fcompany LIKE ('Acme%')

Returns:
CustName
Acme Rockets Inc.
Acme Rocket Skates
Acme Inc
Acme
Acme Rackets Inc.

Notice that we lost “The Acme Company”. If you wanted to include them the following statement will work.

WHERE
  SOM.fcompany LIKE ('%Acme%')

Returns:
CustName
Acme Rockets Inc.
Acme Rocket Skates
The Acme Company
Acme Inc
Acme
Acme Rackets Inc.

Use a preceding % sparingly because the statement becomes non-Sargable. Basically that query will be unable to use an index placed on the field and may cause performance problems.

The astute reader will notice that we missed the Acme company named “A Company Making Everything”. How can we make sure they are returned as well? More than one wildcard character can be used.

WHERE
  SOM.fcompany LIKE 'a%c%m%e%'

Returns:
CustName
Acme Rockets Inc.
Acme Rocket Skates
Acme Inc
A Company Making Everything
Acme
Acme Rackets Inc.
Acme Rickets Inc.

Important to note. You will notice that my last wildcard statement is in a different case than the records in the table. Depending on the collation setting of your database, your results may vary as some are case sensitive.

An Underscore (_) replaces a single character.

WHERE
  SOM.fcompany LIKE 'Acme R_ckets Inc.'

Returns:
CustName
Acme Rockets Inc.
Acme Rackets Inc.
Acme Rickets Inc.

Let’s raise the complexity a notch. Brackets [] can also be used to match a single character in a range.

WHERE
  SOM.fcompany LIKE 'Acme R[a-i]ckets Inc.'

Returns:
CustName
Acme Rackets Inc.
Acme Rickets Inc.

The range can also represent distinct letters.

WHERE
  SOM.fcompany LIKE 'Acme R[a,o]ckets Inc.'

CustName
Acme Rockets Inc.
Acme Rackets Inc.

Brackets can also exclude characters in a range. The symbol ^ (typically over the 6 key) represents Not.

WHERE
  SOM.fcompany LIKE 'Acme R[^a-i]ckets Inc.'

Only Acme Rockets Inc is returned. Brackets and underscores are pretty powerful, but aren’t used nearly as frequently as the general wildcard %.

Using Not

I’ve already eluded to the use of Not with wildcards. However, it’s easy to use in non-wildcard clauses as well. Not negates any expression that it precedes. Using the sample database the following query:

SELECT
 SOM.fsono SalesNo
FROM
  dbo.somast SOM
WHERE
  SOM.fcustno = '000104' AND
  SOM.fcompany NOT LIKE ('Acme Rockets Inc.')

Returns:
SalesNo
000003
000020
000052

That about covers the basics of filtering records using a Where clause. In future articles I’ll discuss Joins, Grouping, and the use of Sub Queries.

Any questions? Is there anything in particular you’d like me to cover?

How to Code with Toad

I mentioned in my last article that I would be posting a video showing you the basics of working with Toad for SQL Server.

What do you think of my first effort?

My Way of Coding T-SQL Consistently

As I mentioned in the last article, you need to develop or adopt a standardized way of writing T-SQL code. In this article, I’m going to share the habits I’ve developed, “my way” if you will.

Vizzini: Finish him. Finish him, your way.
Fezzik: Oh good, my way. Thank you Vizzini… what’s my way?
Vizzini: Pick up one of those rocks, get behind a boulder, in a few minutes the man in black will come running around the bend, the minute his head is in view, hit it with the rock.
Fezzik: My way’s not very sportsman-like.
The Princess Bride

Just like Fezzik, you may be asking, “Well, which way is my way?” The best way for me to help you with that is to share the method(s) I’ve developed and reasons for doing so. Keep in mind that many people will disagree with my way, and that’s alright. This article however, is all about me, and since these methods work for me, I think they’ll work for you too. Adoption of a 3rd party query tool helps to keep your code consistent.

Capitalization Usage

I write all of the T-SQL functions in ALL CAPS and typically CamelCase field and database names. More on this later.

Field Order

I list the fields in an order which makes sense to me. Consider the script I shared with you in the last article:

SELECT  fcompany, fcustno, fcustpono, forderdate, fsono, fsorev FROM somast

While the computer doesn’t care, I list my fields in roughly most general to most specific order. Therefore, I would order those in the following manner:

SELECT  fcustno, fcompany, fsono, fsorev, fcustpono, forderdate  FROM somast

This order just “feels” right to me. Customers “are larger” or less specific than Sales Orders and the fields are listed in this order. This will become a lot more important when we discuss joining tables.

I use a lot of vertical space in my queries. The query above becomes this:

SELECT fcustno,
       fcompany,
       fsono,
       fsorev,
       fcustpono,
       forderdate
  FROM somast

Once again, the computer couldn’t care less, it just makes everything easier for me to read.

Leading vs. Trailing Commas

I prefer to use leading commas. The query now looks like this with leading commas:

SELECT fcustno
 ,fcompany
 ,fsono
 ,fsorev
 ,fcustpono
 ,forderdate
FROM
  somast

It may take you awhile to get used to seeing the query like that, but there’s a good reason for doing so. If you are editing a long query with trailing commas, and you need to remove the last entry in the list, you will invariably forget to remove the final comma and receive an error.

SELECT fcustno,
       fcompany,
       fsono,
       fsorev,
       fcustpono,
  --     forderdate  **** Using Double dashes comments out this line. This code now ends with a comma and will error out. 
  FROM somast

Leading comma queries are easier to edit in my opinion.

Aliasing

Butch: Would you hand me a dry towel, miss beautiful tulip?
Fabienne: Oh, I like that. I like being called a tulip. Tulip is much better than mongoloid.
Pulp Fiction

Aliasing is used in two places in a SQL Statement, the Select clause and the From clause. The From clause alias renames the database object and I typically use 3 or 4 character alias names. I also include the database owner or schema as well. Alias names can be assigned following the AS keyword, but it is optional. For example, the leading comma query becomes this:

SELECT SOM.fcustno
 ,SOM.fcompany
 ,SOM.fsono
 ,SOM.fsorev
 ,SOM.fcustpono
 ,SOM.forderdate
FROM
  dbo.somast AS SOM  -- Or dbo.somast SOM

I know that seems like extra work, but its absolutely necessary when you write multiple table queries. Some people alias their tables with meaningless single letter values like A, B, and C. I can’t stand this because when I read a SQL statement aliased in this way, I have to constantly look at the from clause to remind myself what each alias represents.

Once again, consistency is key. I always use the following alias names:

somast = SOM
soitem = SOI
sorels = SOR
slcdpm = SLC
etc.

Since I am consistent, I always know that a field aliased with ARI represents Aritem; and I can cut and paste between queries as well. Incidentally, FastForms adds _EXT to the name of it’s extension tables.. Therefore, I add a lowercase e to any extension table alias. Soitem_EXT becomes SOIe. Keep it simple.

The second alias type applies to the field name. What the heck does fsono mean to a user? Therefore, I alias my queries like this:

SELECT SOM.fcustno CustNo
 ,SOM.fcompany CustName
 ,SOM.fsono SONo
 ,SOM.fsorev SORev
 ,SOM.fcustpono CustPONo
 ,SOM.forderdate OrderDate
FROM
  dbo.somast SOM

Much Easier to Read


Also, because I’ve aliased my table names in CamelCase, many query tools such as SSRS will automatically format them as two words. “OrderDate” becomes “Order Date”. You can include spaces in your alias names, but you must put brackets around them. [Order Date] will render properly in almost any reporting tool, but I dislike all the extra typing.

Tomorrow, I’ll post my first video which will be a short demonstration of how I follow these “rules” with Toad for SQL Server.

Page 10 of 33« First...89101112...2030...Last »