This topic has been done to death in many places, but I still get frequent questions about Null field issues in T-SQL. My purpose in this article is not to debate the definition of Null. Smarter men than I have already done so.
SQL Server Books Online indicates “A value of NULL indicates that the value is unknown.” However, some disagree, most recently Joe Webb – Null is not Unknown. Some argue that there should be several different types of Null.
Why Ask Why?
As I said before, I don’t really care about the various interpretations of the Null value. I just know that they’re a pain to deal with. I have never received questions like this:
- What exactly is a Null?
- What’s the definition of Null?
The questions I get follow along these lines.
- Why am I losing records in this join?
- Why is my total (or average) off in this query?
- Why does my SSRS parameter list look funny?
- Why won’t my parameters default like they should?
So dear reader, as part of the T-SQL Basics Series, I am going to share what you need to know about dealing with Nulls rather than defining them. If you’d like to follow along, please download the Kung Fu Database Script.
Where Clause
To start with, while people disagree about the definition of Null, everyone agrees what Null is not. It is not any of the following:
- 0
- blank
- ” (Empty String)
- 1/1/1900
When most of us look at a Where clause, we only consider two options – True and False. However, SQL Server uses three value logic, True, False, and Unknown. Since Null represents an Unknown value any comparison to it results in Unknown, not True or False. Consider the following pseudo-code:
Select * from Employees where Salary > 500 |
If the Salary field doesn’t allow nulls then it’s very easy to return the list of Employees which earn more than $500 per some time period. However, if Salary is a Nullable field, then any Employee without a Salary specification will not be returned. How could it be? You simply do not know whether they make more than $500 or not.
I find the easiest way for me to learn these concepts is to look at examples. Let’s start with the following base query from the Kung Fu Database.
SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice FROM dbo.OrderDetails OD |
Returns: 6 Records. This is the total data set.
Now consider the following:
SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice FROM dbo.OrderDetails OD WHERE OD.OrderQty >= 1 |
Returns: 4 Records. Those with Null Values for OrderQty are ignored.
SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice FROM dbo.OrderDetails OD WHERE OD.OrderQty < 1 |
Returns: 0 Records. Comparing any value to Null results in Unknown and those those records are not returned.
SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice FROM dbo.OrderDetails OD WHERE OD.OrderQty = NULL |
Well, logically this should return 2 records, right? After all, you are asking for those records with Null values for OrderQty. However, the correct answer is 0 records. The reason is simple. Comparing any value to Null (even another Null) returns Unknown. This may seem confusing, but logically it makes sense. Consider my Salary example above. If Bob and Dave both have Null Salary values, do they earn the same amount? Obviously, there’s no way to know. Therefore, the query evaluates to Unknown for those records and they aren’t returned. So, how can we return the Null records?
SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice FROM dbo.OrderDetails OD WHERE OD.OrderQty IS NULL |
Returns: 2 Records. For the same reason, we could have used “IS NOT NULL” and 4 records would return.
So, what if we want to return any records with a quantity less than 1 and include Null Values? Well, you could write that query like this:
SELECT OD.OrderID, OD.[LineNo], OD.ProductID, OD.ItemDescription, OD.OrderQty, OD.Discount, OD.ListPrice FROM dbo.OrderDetails OD WHERE OD.OrderQty < 1 OR OD.OrderQty IS NULL |
Returns: 2 Records.
Replacing Null Values with ISNULL or COALESCE
So that works, but it’s wordy. How can we more easily deal with nulls? Well, we can replace them with an actual value by using COALESCE or ISNULL. Each function returns the first Non-Null value found in the list of arguments. For example, SELECT ISNULL(NULL, 25) returns 25. However, SELECT ISNULL(NULL, NULL) returns Null.
Some people prefer IsNULL because the name of the function is more understable to them and it has been shown to have a very slight performance advantage. However, I use Coalesce because IsNull is limited to 2 arguments, whereas COALESE doesn’t have any realistic limit. SELECT COALESCE(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 7) still returns 7. Remember that it returns the first non-null value. Therefore, SELECT COALESCE(NULL, 3, NULL, NULL, NULL, 7) only returns the first value which is 3.
So, you may use Coalesce to simply these comparisons as well as for aesthetic reasons. Consider the previous Kung Fu Database query re-written with Coalesce. I’ve formatted it vertically to emphasize the difference.
SELECT OD.OrderID ,OD.[LineNo] ,OD.ProductID ,OD.ItemDescription ,COALESCE(OD.OrderQty,0) OrderQty -- COALESCE AND ALIASED ,OD.Discount ,OD.ListPrice FROM dbo.OrderDetails OD WHERE COALESCE(OD.OrderQty,0) < 1 |
Who determines whether or not you should replace the field like this? For the most part, the business user. So, if the business user provides the proper substitution values, the query may be written like this:
SELECT OD.OrderID ,OD.[LineNo] ,COALESCE(OD.ProductID, 0) ProductID ,COALESCE(OD.ItemDescription, 'None') ItemDescription ,COALESCE(OD.OrderQty, 0) OrderQty ,COALESCE(OD.Discount, 0) Discount ,COALESCE(OD.ListPrice, 0) ListPrice FROM dbo.OrderDetails OD |
Joining on NULL-able Columns
So, I’ve demonstrated how Null values can cause problems in the Where clause. For the same reasons, joining on Null-able columns is problematic as well. Consider my example Order Details table which allows a Null value for ProductID. Consider the following script:
SELECT OD.OrderID ,OD.[LineNo] ,OD.ProductID ,P.ProductID ,P.ProdName ,OD.ItemDescription ,OD.OrderQty ,OD.Discount ,OD.ListPrice FROM dbo.OrderDetails OD FULL OUTER JOIN dbo.Products P ON P.ProductID = OD.ProductID |
If that were written as your standard inner join, the recordset would lose the indicated record.
Grouping, Sorting, and Aggregates
In regards to Group By and Order By, Null values are considered equal. Meaning that all of the Null values are treated as one entity and are grouped and ordered as such.
Consider the following queries and result sets:
SELECT OD.OrderQty FROM dbo.OrderDetails OD ORDER BY OD.OrderQty |
SELECT OD.OrderQty FROM dbo.OrderDetails OD GROUP BY OD.OrderQty |
Aggregate Functions
Aggregate functions such as COUNT, SUM, and AVG ignore Null values. However, one can use COUNT(*) which will count all records. Consider the following query:
SELECT SUM(OD.OrderQty) QtySum ,Avg(cast(OD.OrderQty as decimal (4,2))) NullAvg ,Avg(cast(COALESCE(OD.OrderQty,0) as decimal (4,2))) ZeroAvg ,COUNT(OD.OrderQty) QtyCount ,COUNT(*) TotalCount FROM dbo.OrderDetails OD |
I had to Cast the quantities as decimals, because otherwise the values would have been rounded to 1. If you do the math, you’ll see that aggregate functions only considered the four records with non-null values. This is particularly noticeable when you compare the NullAvg and the ZeroAvg. The ZeroAvg actually divides the total by 7 instead of 4.
In this article, I’ve attempted through the use of many examples to describe the issues when working with Null values and how to deal with them. In the next article, I’ll discuss how to deal with blank values in Joins, Where clauses, and when creating parameter lists for SQL Server Reporting Services.
[…] the previous article I discussed the issues when working with Null-able fields in Joins and Where clauses. This article […]
A good, high-level overview of NULL and the issues it might cause.
Two additions:
1. You link to my blog when stating that some people argue for different types of NULL. Which is strange, because in that blog post I prove that this idea is wrong, since it is based on a fallacy.
2. Your sample query with full outer join may lead people to think that you need full outer join to get a join to match on null values. That is not the case. It just causes rows without match in either table to be retained. (And in the case of this specific data, I would argue that a right outer join would have been sufficient (and maybe even more appropriate).
[…] to see if a value changed before I run my update statement. However, depending on your settings, Null values are not equal to each other. Therefore, any record with a Null value will not be updated if one of […]