Archives

T-SQL Basics: Different Types of Joins

In this seventh article of the T-SQL Basics Series, we will be covering both the basics as well as some specific issues with SQL Joins. If you’ll recall our last lesson, we wrote a sales query for Master Po. He then requested that we add customer information to the report.

Note, after receiving e-mails from non-M2M users complaining that they couldn’t follow along, I’ve decided to switch to using a database that everyone can use. So, I’ve created a quick database for our Kung Fu Temple Gift Shop. If you want to follow along, please run the following code on a test server: Kung Fu Database Script. It will create a small and imperfect database for us to use in this series.

Prerequisite Reading

Before we begin this lesson, I ask that you read an ancient manuscript by Master Jeff Atwood (Blog/Twitter). Yes, I know he belongs to the developer’s temple, but he’s already written the excellent primer, A Visual Explanation of SQL Joins. Please read the text and come back when you’re finished so I can fill in the details. Go ahead Grasshopper, I’ll wait.

Additional Wisdom about Joins

I love the primer because of his use of Venn Diagrams. The first time I worked with SQL Joins, memories of middle school math lessons about Venn Diagrams popped into my mind. While nothing he wrote is incorrect, I wish to add to it and flesh out your knowledge of SQL Joins.

Let me give you a brief run down of the tables in the Kung Fu Temple Gift Shop Database, and how they are related. The tables we will be using are listed below. I’ve listed the Primary Keys, but have not yet set up Foreign Keys relationships. We’ll discuss those in a later lesson.

Table NamePrimary Key
CustomersCustomerID
ProductsProductID
OrderMasterOrderID
OrderDetailsOrderID, LineNo

Inner Join

The Inner Join is most commonly used and is in fact the default join type. This type of join only returns records which match from both tables. There are two basic ways to write it. Let’s link the Order Master and Order Items tables and return a few fields from each.

Select
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
From
  dbo.OrderMaster OM
 ,dbo.OrderDetails OD
where
  OM.OrderID = OD.OrderID

Does that code look strange to you? Well, if you’ve been following my writings, it should. Some of the M2M Reports, which I’ll cover in a later lesson, use this old syntax and there is technically nothing wrong with writing it. The optimizer processes it the same way as the syntax I will cover in a minute and it runs just as fast. However, Master Po has forbidden it as an affront to the laws of nature, so you will never see me use it again. Why? I’m glad you asked.

  1. It’s hard to read. This example looks simple because it only includes two tables. However, when more tables are involved its just too easy to forget one of the Where clause conditions. Therefore, using this joining method is…..
  2. Dangerous. If you forget to include a table join in the Where clause, you will create a cartesian product with that join. A cartesian product returns every combination of each table where the join is either missing or invalid. For example, in our Kung Fu Gift Shop database there are 4 records in Order Master and 5 in Order Items, if we neglect to add the Where clause, the recordset jumps to 20 (4 * 5) records. In a production database with millions of rows in each table, the result set becomes ridiculously large. Also, every additional join you miss exponentially increases your recordset.

For these reasons we will explicitly the Join this way:

SELECT
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
FROM
  dbo.OrderMaster OM INNER JOIN dbo.OrderDetails OD ON OM.OrderID = OD.OrderID

In T-SQL the use of the word Inner is optional as well. I personally add it, but many don’t.

SELECT
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
FROM
  dbo.OrderMaster OM JOIN dbo.OrderDetails OD ON OM.OrderID = OD.OrderID

Please note that you can mix both styles in the same query, but I don’t recommend it for the same reasons.

SELECT
  OM.OrderID
 ,OM.CustomerID
 ,OM.OrderDate
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.OrderQty
FROM
  dbo.Products PRO, dbo.OrderMaster OM -- Do Not do this. 
  JOIN
    dbo.OrderDetails OD
  ON OM.OrderID = OD.OrderID
WHERE
  PRO.ProductID = OD.ProductID -- Join between Products and Order Details.

Outer Joins – Left, Right, and Full

Left Outer Join

The Left Outer Join returns all of the records (not filtered by a where clause) from the field to the left of the Join and then any corresponding records from the table to the right of the Join. The records from the first (left) table are returned even if there are no corresponding records in the other table. As with other Joins the word Outer is optional as well. Left Outer Join works the same as Left Join.

For example, lets say that Master Po wants a list of all of the Kung Fu Temple Gift Shop’s customers and any of the orders they have placed. You could write this query two ways. First, let’s use a Left Outer Join.
Option 1 Left Outer Join

SELECT
  CUST.CustomerID
 ,CUST.Title
 ,CUST.CustFirstName
 ,CUST.CustLastName
 ,OM.OrderID
 ,OM.OrderDate
 ,OM.DueDate
 ,OM.PONo
FROM
  dbo.Customers CUST
  Left JOIN dbo.OrderMaster OM
    ON CUST.CustomerID = OM.CustomerID

Take a look at the first couple rows of the result set for this query. 5 Total Records were returned.

My name is listed twice because I placed 2 orders. Also, notice the Null values for Master Kan? The values for Order Master fields is null because there is a customer record for him but he hasn’t placed any sales orders.

Let’s look at the same query with an Inner Join. 61 Total Records were returned, one less than the previous query because we only have 1 customer which hasn’t placed orders.
Option 2 – Inner Join

SELECT
  CUST.CustomerID
 ,CUST.Title
 ,CUST.CustFirstName
 ,CUST.CustLastName
 ,OM.OrderID
 ,OM.OrderDate
 ,OM.DueDate
 ,OM.PONo
FROM
    dbo.Customers CUST
  INNER JOIN
    dbo.OrderMaster OM
  ON CUST.CustomerID = OM.CustomerID

When run, the record for Master Kan is removed. So, which is right? Well, it depends on what Master Po meant. Does he want to include customers which have no orders? Option 1 includes all customers whereas option 2 omits customers without orders. I used this particular example to emphasize that you need to learn to ask the right questions in order to get the right answers. In this case, we’ll stick with the Left Join.

What if Master Po wants a list of only those customers who never placed an order? We can achieve this using a Left Join and adding “WHERE OM.OrderID IS NULL”. In this case, only one record shows. Do you remember seeing that type of join before? I used it to find orphan records in your M2M Database. M2M Support still hasn’t forgiven me for posting that as so many of you called them asking them to fix your database. I’ll cover more about Left Join usage and problems with Nulls in a future lesson.

Right Outer Join

Take everything I just said for Left Outer Join and reverse it. Period. In fact, I have NEVER written a Right Outer Join. There isn’t a reason that why one would need to use one. There’s a rumor that persists that Left Joins are faster than Right Joins, but that isn’t true either. I only use Left Joins for consistency sake, and I suggest you do as well.

Full Outer Join

A Full Outer Join is returns all rows from both tables regardless of whether there is a match. In a way, it combines a Left and Right Outer Join. It’s not all that commonly used. The most common instance in which I use it is to compare the results of two queries. For example, let’s compare simplified versions the Inner Join and Outer Join queries I listed previously.

SELECT First.CustomerID FirstCustID,
       First.OrderID FirstOrderId,
       Second.CustomerID SecondCustID,
       Second.OrderID SecondorderID
  FROM    (SELECT CUST.CustomerID, OM.OrderID
             FROM    dbo.Customers CUST
                  INNER JOIN
                     dbo.OrderMaster OM
                  ON CUST.CustomerID = OM.CustomerID) First
       FULL OUTER JOIN
          (SELECT CUST.CustomerID, OM.OrderID
             FROM    dbo.Customers CUST
                  LEFT JOIN
                     dbo.OrderMaster OM
                  ON CUST.CustomerID = OM.CustomerID) Second
       ON First.OrderID = Second.OrderID
 WHERE    First.CustomerID <> Second.CustomerID
       OR Second.CustomerID IS NULL
       OR First.CustomerID IS NULL


I know it looks complicated, and the results are underwhelming, but we’re working with a small, simplistic data set. Basically I took the first query and creatively aliased it as “First”, the second as “Second”, put them together with a Full Outer Join, and then looked for any differences in their CustomerID fields. Don’t be thrown off by the “Is Null” statement as we’ll be covering Nulls in detail in a future article.

Another real world example where I use Full Outer Joins is if I’m changing report logic and need to prove that the results are the same. I’ll join the results of the first query, with my new query, and look for differences. There are also some 3rd party products which do that kind of comparison automatically.

Non Equi Joins

All of the examples we’ve used so far are Equi Joins because the join conditions have always been equal. Table A Inner Join Table B on Field A = Field B. Non Equi Joins are not equal, greater than, less than, etc. One common use for Non Equi Joins is joining dates to a table of fiscal periods. I’ll cover dates and calendar tables in detail later, but let’s assume that we had a table with a list of fiscal periods with a begin and end date for each period (Like M2M’s GLRULE table). In psudo-code you would join them like this:

SELECT *
  FROM    dbo.OrderMaster OM
       INNER JOIN
          FiscalPeriods FIS
       ON OM.OrderDate >= Fis.BeginDate AND OM.OrderDate < Fis.BeginDate

SQL Statements in Review

So far in this series, we have discussed the basics of T-SQL Syntax, importance of coding consistently, elements of the Where clause, and summarizing and grouping data. Let’s take a moment to review the order of statements in a SQL clause as a way of closing this series of lessons in the T-SQL Basics Series.

  1. Select – followed by a list of fields, functions, etc. and should be properly aliased.
  2. From – followed by the tables to be used in the query and how they are to be joined.
  3. Where – followed by the conditions by which individual records are to be filtered out.
  4. Group By – followed by fields by which records are to be grouped. Remember that order is important.
  5. Having – followed by the conditions by which the groups are to be filtered out.
  6. Order By – followed by the fields by which the final result set is to be ordered.

This is not the end of our lessons in T-SQL. In future installments I’ll discuss in more depth topics like sub-queries, the challenges associated with Nulls, common functions and their usage, working with dates and date tables, as many others. Patience Grasshopper, we’ll get there.

What other topics would you like me to cover?

Related posts:

5 comments to T-SQL Basics: Different Types of Joins

  • Might want to briefly mention the existence of self-joins…not that there’s anything inherently “ooooh! aaah!” about them. Oh, have you talked about merges before?

  • Jen, thanks for the suggestion. Self Joins are one of those things I mentioned in the review as a future topic. As far as Merges, I mentioned them here:

    http://www.made2mentor.com/2010/01/using-insert-update-and-merge/

  • Joshua

    I swear you must be a mind reader, last few of your posts have always related to something I’ve just been working on, weird!

    So far this is making sense to me, especially with the use of the Venn Diagrams, but I realize it takes time and experience before it’s second nature.

    Without going into too many specifics, the last few joins I’ve had to do for different reports have been dumb luck I suspected, I created a join, it returned what I anticipated, I wasn’t getting duplicate entries, so I assumed it was right.

    Any dangers in that way of reasoning when it comes to joins?

    Joshua

  • Well, there are a few ways to pull the SQL Statements out of M2M which I’ll cover soon. I’ll also cover things like watching your grain to make sure that you haven’t created a bad join.

    However, the most foolproof way of making sure your query joins are correct in M2M is by dissecting the reports to get the SQL Statements out.

    Next week I will be teaching something else, I’ll announce specifics tomorrow, but will come back to T-SQL after that. So, if you have specific join questions you can e-mail me or stay tuned for more info.

  • […] because you’ll “lose” records. That’s why I always link to INProd with a Left Outer Join. Would you like to see the invoice records that you’d lose in an Inner Join? Run the […]

Leave a Reply to David

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>