Archives

T-SQL Basics - Ordering, Summarizing, and Grouping Data the Kung Fu Way

Today Grasshopper, I will teach you the basics of ordering, summarizing, and grouping your data to give it meaning. Let’s say that the Kung Fu Temple Gift Shop has had poor sales. Master Po wants a list of sales details from you, but he wants them ordered by the part number.

Note for the non Made2Manage people, I am using the M2M Educational Database. Sorels is the Sales Order Release table. I’m using it because it is the sales table which has dollar values. I’ve attempted to document the code with comments so students from other temples can follow the lesson.

Also, all of the screenshots are the data grids produced by Toad for SQL Server which recently released a new beta. Anyone can use the full beta until March 3rd.

Order By

The Order By clause does just what the name suggests, orders records. The Default order is ascending (ASC), but you can also return them in descending (DESC) order as well.

SELECT
  SOR.fsono                             SalesNo
 ,SOR.fenumber                          ItemNo
 ,SOR.frelease                          Release
 ,SOR.fpartno                           PartNo
 ,(SOR.forderqty * SOR.funetprice)      NetAmount -- Order Quantity * Net Unit Price 
FROM
  dbo.sorels SOR
ORDER BY
  SOR.partno -- ASC is default and optional, DESC needs to be specified.

Master Po now decides that he wants the records sorted in descending (DESC) order and then sorted by part revision. Note that I’ve used the aliased field names as well, more on that later.

SELECT
  SOR.fsono                             SalesNo
 ,SOR.fenumber                          ItemNo
 ,SOR.frelease                          Release
 ,SOR.fpartno                           PartNo
 ,SOR.fpartrev                          PartRev
 ,(SOR.forderqty * SOR.funetprice)      NetAmount -- Order Quantity * Net Unit Price 
FROM
  dbo.sorels SOR
ORDER BY
  PartNo Desc, PartRev

Master Po likes the report, but he would like to have the details summarized for him. How do we accomplish this?

Using Aggregate Functions

The most commonly used aggregate functions are listed below. The complete list and definitions can be found here.

  • Sum(SomeField)
  • Avg(SomeField)
  • Count(SomeField)
  • Max(SomeField)
  • Min(SomeField)

So Grasshopper, rather than tediously explaining what you can already guess, let’s look at the following examples.

SELECT
  sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount -- distinct removes duplicates
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR


This works, but Master Po is not happy. The data has little meaning because it doesn’t differentiate by Part Number, Customer, Date, or any other useful attributes. So, how can you summarize the data by part number for him? Well, you might think that you could do this:

SELECT SOR.fpartno                      PartNumber
 ,sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR

However, SQL Server will return an error like “Column ‘dbo.sorels.fpartno’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.”

Using Group By

Grouping returns a single row for each unique value in the selected column with the corresponding aggregate value. To appease Master Po, let’s add a Group By clause to the above query.

SELECT SOR.fpartno                      PartNumber
 ,sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR
  Group By SOR.fpartno


So, what have we returned? Well, now the information has more meaning. Look at it again, after I’ve adjusted the alias names.

We can now see important patterns in the data. For example, why is there such a large discrepancy between the minimum and maximum unit price for part number CP2010 and CP0500? Could it be that we’ve been swindled by Ninjas? Group by can also be used for more than one column. In the following example, we are grouping first by the Part Number and then by the Sales Order Number.

SELECT SOR.fpartno                      PartNumber
 ,SOR.fsono                             SalesNo
 ,sum(SOR.forderqty * SOR.funetprice)   TotalAmountOrdered
 ,Avg(SOR.funetprice)                   AverageUnitPrice
 ,count(*)                              SorelsRecordCount
 ,count(distinct SOR.fsono)             SalesOrderCount
 ,Max(SOR.funetprice)                   MaxUnitPrice
 ,MIN(SOR.funetprice)                   MinUnitPrice
FROM
  dbo.sorels SOR
    Group By SOR.fpartno, SOR.fsono
    order by SOR.fpartno, SOR.fsono


Granted, the results are not earth shattering because I have a very limited data set. Also, since we have not joined to other tables, I can’t group by less specific attributes like Customer or Product Class, but you get the idea.

Let’s say that Master Po wants to know the total quantity due to ship per month per part number in the year 2010. In the future, I’ll show you a much easier way to do this using a date table, for now, we’ll use the following code.

SELECT
  SOR.fpartno           PartNumber
 ,month(SOR.fduedate)   MonthDue
 ,Sum(forderqty)        OrderQty
FROM
  dbo.sorels SOR
WHERE
  SOR.fduedate >= '1/1/2010' AND
  SOR.fduedate < '1/1/2011'
GROUP BY
  SOR.fpartno
 ,month(SOR.fduedate)
ORDER BY
  SOR.fpartno
 ,month(SOR.fduedate)


The Month function returns the month number of a specific date as a number. Master Po can now scan the list, or better yet you could prepare a good looking report for him using SQL Server Reporting Services (SSRS). Using it he’ll know when certain parts are due and can schedule his monk’s time accordingly. Before any of you geeks out there call me on this, yes I know Master Po was blind, but he was also magical and I’m sure he could “read” SSRS reports using those abilities.

Master Po is a demanding Sifu, he must be considering what Caine had to do with his forearms before leaving the temple, and suspects you have some duplicate records in the sorels table because your figures seem a little too high. Bitter experience has shown Master Po that M2M was designed by the Drunken Master and lacks primary and foreign keys. How can we verify that we do not have duplicates in sorels and be allowed to leave the temple?

The Having Clause

If Master Po had designed M2M the sorels would have a primary key consisting of Sales Order Number, Sales Order Line Item, and Sales Order Release. You can learn more about Primary Keys and Referential Integrity problems in M2M here.

Having is nearly identical to the Where clause. Having filters groups in the same way that Where filters records. Since we want to know if we have duplicates, we’ll group by the columns which should be unique in each record.

SELECT   count(*)
FROM     dbo.sorels SOR
GROUP BY SOR.fsono, SOR.finumber, SOR.frelease
HAVING   count(*) > 1

Zero records are returned, meaning you have no duplicates. After telling Master Po, he wants another variation of the report. He wants to know the total dollar amount per sales order, but only wants to see those with a total more than $10,000. Yes, I know he would want it in Yen but my data set is limited to USD so just play along.

So, how would we do this?

Select SOR.fsono
,sum(SOR.forderqty * SOR.funetprice) TotalPerOrder
From dbo.sorels SOR
Group by SOR.fsono
Having sum(SOR.forderqty * SOR.funetprice) > 10000
Order by sum(SOR.forderqty * SOR.funetprice) desc

To impress Master Po further, we ordered them from highest to lowest dollar amount.

Field Name Alias Usage

Notice that the aliased field name cannot be used in the Group By or Where clause , but they are permissable in the Order By clause.

 Group by PartNumber -- not legal
Order by PartNumber --Legal

Why? The reason field aliases cannot be used in these clauses statements is due to the order in which SQL Server evaluates statements. Basically, SQL Server evaluates the statement in the following steps.

  1. The From clause including the Joins (more on Joins later).
  2. The Where clause filters out any unnecessary records.
  3. The Group By is then applied.
  4. Having filters out unneeded groups.
  5. The Select is evaluated including the aliased field names assuming they’ve been assigned.
  6. The Order By then sorts the records.

Since the Select portion of the SQL statement is evaluated after the Where, Group By, and Having clauses, none of them can use aliased field names. SQL Server really has no idea that sor.fpartno is PartNumber at that point. The table alias can be used throughout because it is assigned in the first step. Make Sense?

The Demanding Master Po

Anyway, back to our story. You’ve accomplished what Master Po asked, verified that your table has no duplicates, and you are sure he will be pleased. However, when you bring it to him, he smiles and says, “Very good Grasshopper… but which customers placed these orders? Oh, and be sure to practice your pebble snatching as well.

Well Grasshoppers, I’ll show you how to get the customer information in the next lesson. As far as snatching pebbles, you’re on your own.

Does anyone else have pebble snatching tips to share?

Related posts:

3 comments to T-SQL Basics – Ordering, Summarizing, and Grouping Data the Kung Fu Way

  • Joshua

    I actually just finished a report using the sum feature but didn’t declare sum…

    (jodrtg.fuprodtime * jodrtg.foperqty) as [Prod Hrs],
    instead of
    sum(jodrtg.fuprodtime * jodrtg.foperqty) as [Prod Hrs],

    yet it still worked. Anything to watch out for?

    Also, can’t believe Master Po was so forgiving on all those trailing zero’s…

  • Joshua, If I’m not mistaken, those are Job Order Routings. If you multiple two numbers to get Prod Hrs, then you have received the value at the grain of that table. If you needed to summarize the data by a higher entity, then you’d need to sum.

    For example, if we have OrderMaster and OrderItems tables, let’s assume that( OrderItems.OrderQty * OrderItems.ListPrice ) returns your amount per line. That’s great, but you haven’t summed anything. If your boss wants to know the total per Order then you’d use something like this:

    select sum( OrderItems.OrderQty * OrderItems.ListPrice ) as TotalAmount
    ,OrderMaster.OrderID
    from OrderMaster inner join OrderItem on blah.
    group by OrderMaster.OrderID

    Now, you’ve summed by OrderMaster (The Higher Entity). Make sense?

    As far as all the trailing zeros, I haven’t gotten to formatting the data, and I’d never hand Master Po a report that actually looked like that. 🙂

  • Fred Crawford

    David, I wish I had known about the “having” statement 6 or 7 years ago, that would have eliminated my need to create temporary cursors in several queries. Another thing I didn’t know about was the “count(*)”. Thanks again for all of your hard work, Fred

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>