When it comes to databases it seems that the hardest thing for many people to grasp is NULL. In it's simplest terms NULL represents a lack, an emptiness, a void, nothing. Perhaps too much importance is assigned to educating our youngsters with concrete examples when working on basic math skills.
Teacher: Now, Johnny. If you have two apples and you give one to Mary and one to Jack, how many do you have left?
Johnny: No apples
Teacher: That's right Johnny, you have zero apples.
But having zero apples and having NULL are two very different things. In the first case, you know that you have no apples, in the second you have nothing, which is different from having no apples because apples are something, see?
You're looking confused, so let's try this again, only this time using set theory, or at least a version of it. Let's look at two database tables. Both tables contain a number and a word:
Table 1
| ID |
Value |
| 1 |
The |
| 3 |
Quick |
| 4 |
Brown |
| 5 |
Fox |
|
Table 2
| ID |
Value |
| 2 |
Jumped |
| 3 |
Over |
| 4 |
The |
| 5 |
Lazy |
|
When we use JOIN, we are looking at an intersection of the data. So we get the following:
SELECT * FROM #table1 a INNER JOIN #table2 b ON a.ID = b.ID
| ID |
Value |
ID |
Value |
| 3 |
Quick |
3 |
Over |
| 4 |
Brown |
4 |
The |
| 5 |
Fox |
5 |
Lazy |
This is easy enough to understand. We take the row from table 1 and match it to the row in table 2. If there is no match (as in ID=1 form Table1 or ID=2 from Table2), we ignore them. But, when we do an OUTER JOIN, we start getting NULLs:
SELECT * FROM #table1 a LEFT OUTER JOIN #table2 b ON a.ID = b.ID
| ID |
Value |
ID |
Value |
| 1 |
The |
NULL |
NULL |
| 3 |
Quick |
3 |
Over |
| 4 |
Brown |
4 |
The |
| 5 |
Fox |
5 |
Lazy |
In this example, there is no corresponding value in Table2 where ID=1, so we show NULL, a lack of value. This is neither good nor bad, it simply means there is no known value. And this isn't too hard to grasp since it makes sense. But things get complicated when we start to define whether a column can be NULL.
When you define a table and say that a column can be defined as NULLable, what you're saying is that we are allowed to have an absence of data. We aren't required to know a person's middle initial, for example. Unfortunately, the subtle difference between a middle initial of NULL and ''. The former says "we don't know what the person's middle initial is" while the latter says, "the person doesn't have a middle initial." The problem comes about because many programs abhor a vacuum and thus will submit empty strings or 0 when what they really should submit is a NULL. That's because it takes an extra step to handle NULL from user input. The most common version looks something like this:
cmd.Parameter["@pName"].Value = (txtBox.Text == string.Empty ? DBNull.Value : (object)txtBox.Text);
In C#, the hardest part is that you have to recast the string as an Object if you want to use DBNull.Value, which is an object. But, you say, if NULL is just the absence of a value, how important can it be?
To answer that question, look at the following two rows of data:
| Id |
DateCreated |
DateUpdated |
| 1 |
2009-02-18 11:30:00 AM |
2009-02-18 11:30:00 AM |
| 2 |
2009-02-18 11:30:00 AM |
NULL |
The table defines DateUpdated as a NULL-able column and the second row has never been updated, this we know. The first row, however, we don't know if it was inserted by someone who specified the DateUpdated at the time of creation or whether it was updated within the same second that it was added.
To wrap up, if you are creating a table, determine whether you have columns that can legitimately be defined as having no data and make them NULL-able. If the columns are always going to have data defined for them, make them non-NULL, with a default value if necessary. But take time to understand NULL.
create table #table1 (ID int, Value char(20))
create table #table2 (ID int, Value char(20))
INSERT INTO #table1 values(1, 'The')
INSERT INTO #table1 values(3, 'Quick')
INSERT INTO #table1 values(4, 'Brown')
INSERT INTO #table1 values(5, 'Fox')
INSERT INTO #table2 values(2, 'Jumped')
INSERT INTO #table2 values(3, 'Over')
INSERT INTO #table2 values(4, 'The')
INSERT INTO #table2 values(5, 'Lazy')
SELECT * FROM #table1 a LEFT OUTER JOIN #table2 b ON a.ID = b.ID
SELECT * FROM #table1 a INNER JOIN #table2 b ON a.ID = b.ID
drop table #table1
drop table #table2
Technorati Tags:
NULL,
SQL