Archives

I Don’t Know! Deal With It! (Nulls)

This topic has been done to death in many places, but I still get frequent questions about Null field issues in T-SQL. My purpose in this article is not to debate the definition of Null. Smarter men than I have already done so.

SQL Server Books Online indicates “A value of NULL indicates that the value is unknown.” However, some disagree, most recently Joe Webb – Null is not Unknown. Some argue that there should be several different types of Null.

Why Ask Why?

As I said before, I don’t really care about the various interpretations of the Null value. I just know that they’re a pain to deal with. I have never received questions like this:

  • What exactly is a Null?
  • What’s the definition of Null?

The questions I get follow along these lines.

  • Why am I losing records in this join?
  • Why is my total (or average) off in this query?
  • Why does my SSRS parameter list look funny?
  • Why won’t my parameters default like they should?

So dear reader, as part of the T-SQL Basics Series, I am going to share what you need to know about dealing with Nulls rather than defining them. If you’d like to follow along, please download the Kung Fu Database Script.

Where Clause

To start with, while people disagree about the definition of Null, everyone agrees what Null is not. It is not any of the following:

  • 0
  • blank
  • ” (Empty String)
  • 1/1/1900

When most of us look at a Where clause, we only consider two options – True and False. However, SQL Server uses three value logic, True, False, and Unknown. Since Null represents an Unknown value any comparison to it results in Unknown, not True or False. Consider the following pseudo-code:

Select * from Employees where Salary > 500

If the Salary field doesn’t allow nulls then it’s very easy to return the list of Employees which earn more than $500 per some time period. However, if Salary is a Nullable field, then any Employee without a Salary specification will not be returned. How could it be? You simply do not know whether they make more than $500 or not.

I find the easiest way for me to learn these concepts is to look at examples. Let’s start with the following base query from the Kung Fu Database.

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD

Null values are hightlighted.


Returns: 6 Records. This is the total data set.

Now consider the following:

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
 
WHERE  OD.OrderQty >= 1

Returns: 4 Records. Those with Null Values for OrderQty are ignored.

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
 
WHERE  OD.OrderQty < 1

Returns: 0 Records. Comparing any value to Null results in Unknown and those those records are not returned.

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
 
WHERE  OD.OrderQty = NULL

Well, logically this should return 2 records, right? After all, you are asking for those records with Null values for OrderQty. However, the correct answer is 0 records. The reason is simple. Comparing any value to Null (even another Null) returns Unknown. This may seem confusing, but logically it makes sense. Consider my Salary example above. If Bob and Dave both have Null Salary values, do they earn the same amount? Obviously, there’s no way to know. Therefore, the query evaluates to Unknown for those records and they aren’t returned. So, how can we return the Null records?

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
 
WHERE  OD.OrderQty IS NULL

Returns: 2 Records. For the same reason, we could have used “IS NOT NULL” and 4 records would return.

So, what if we want to return any records with a quantity less than 1 and include Null Values? Well, you could write that query like this:

SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice
FROM   dbo.OrderDetails OD
 
WHERE  OD.OrderQty < 1 
OR OD.OrderQty IS NULL

Returns: 2 Records.

Replacing Null Values with ISNULL or COALESCE


So that works, but it’s wordy. How can we more easily deal with nulls? Well, we can replace them with an actual value by using COALESCE or ISNULL. Each function returns the first Non-Null value found in the list of arguments. For example, SELECT ISNULL(NULL, 25) returns 25. However, SELECT ISNULL(NULL, NULL) returns Null.

Some people prefer IsNULL because the name of the function is more understable to them and it has been shown to have a very slight performance advantage. However, I use Coalesce because IsNull is limited to 2 arguments, whereas COALESE doesn’t have any realistic limit. SELECT COALESCE(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 7) still returns 7. Remember that it returns the first non-null value. Therefore, SELECT COALESCE(NULL, 3, NULL, NULL, NULL, 7) only returns the first value which is 3.

So, you may use Coalesce to simply these comparisons as well as for aesthetic reasons. Consider the previous Kung Fu Database query re-written with Coalesce. I’ve formatted it vertically to emphasize the difference.

SELECT
  OD.OrderID
 ,OD.[LineNo]
 ,OD.ProductID
 ,OD.ItemDescription
 ,COALESCE(OD.OrderQty,0) OrderQty -- COALESCE AND ALIASED
 ,OD.Discount
 ,OD.ListPrice
FROM
  dbo.OrderDetails OD
WHERE
  COALESCE(OD.OrderQty,0) < 1


Who determines whether or not you should replace the field like this? For the most part, the business user. So, if the business user provides the proper substitution values, the query may be written like this:

SELECT
  OD.OrderID
 ,OD.[LineNo]
 ,COALESCE(OD.ProductID, 0) ProductID
 ,COALESCE(OD.ItemDescription, 'None') ItemDescription
 ,COALESCE(OD.OrderQty, 0) OrderQty
 ,COALESCE(OD.Discount, 0) Discount
 ,COALESCE(OD.ListPrice, 0) ListPrice
FROM dbo.OrderDetails OD

Joining on NULL-able Columns

So, I’ve demonstrated how Null values can cause problems in the Where clause. For the same reasons, joining on Null-able columns is problematic as well. Consider my example Order Details table which allows a Null value for ProductID. Consider the following script:

SELECT
  OD.OrderID
 ,OD.[LineNo]
 ,OD.ProductID
 ,P.ProductID
 ,P.ProdName
 ,OD.ItemDescription
 ,OD.OrderQty
 ,OD.Discount
 ,OD.ListPrice
FROM
    dbo.OrderDetails OD
  FULL OUTER JOIN
    dbo.Products P
  ON P.ProductID = OD.ProductID

If that were written as your standard inner join, the recordset would lose the indicated record.

Grouping, Sorting, and Aggregates

In regards to Group By and Order By, Null values are considered equal. Meaning that all of the Null values are treated as one entity and are grouped and ordered as such.

Consider the following queries and result sets:

SELECT OD.OrderQty
FROM   dbo.OrderDetails OD
 
ORDER BY OD.OrderQty

SELECT   OD.OrderQty
FROM     dbo.OrderDetails OD
 
GROUP BY OD.OrderQty

Aggregate Functions
Aggregate functions such as COUNT, SUM, and AVG ignore Null values. However, one can use COUNT(*) which will count all records. Consider the following query:

SELECT
  SUM(OD.OrderQty) QtySum
  ,Avg(cast(OD.OrderQty as decimal (4,2))) NullAvg
  ,Avg(cast(COALESCE(OD.OrderQty,0) as decimal (4,2))) ZeroAvg 
 ,COUNT(OD.OrderQty) QtyCount
 ,COUNT(*) TotalCount
FROM
  dbo.OrderDetails OD


I had to Cast the quantities as decimals, because otherwise the values would have been rounded to 1. If you do the math, you’ll see that aggregate functions only considered the four records with non-null values. This is particularly noticeable when you compare the NullAvg and the ZeroAvg. The ZeroAvg actually divides the total by 7 instead of 4.

In this article, I’ve attempted through the use of many examples to describe the issues when working with Null values and how to deal with them. In the next article, I’ll discuss how to deal with blank values in Joins, Where clauses, and when creating parameter lists for SQL Server Reporting Services.

Related posts:

3 comments to I Don’t Know! Deal With It! (Nulls)

Leave a Reply to Writing T-SQL Merge Statements the Right Way | Made2Mentor

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>