Register
Sunday, February 05, 2012
 
 DBAs And ProgrammersBlog
  
News! Minimize
   
 
 Print   
 
Misc Blog Stuff Minimize
   
 
 Print   
 
The Reluctant DBA Minimize
 
 
 
 Print   
 
Reluctant DBA Minimize
   
 
  
 
Reluctant DBA Minimize
   
 
  
 
The Reluctant DBA Minimize
 
Oct27

Written by:CarpDeus
10/27/2008 11:50 AM 

 Jeff Atwood recently twitted that he was "starting to really dread the words "INNER JOIN". I read it now as REDUCE PERFORMANCE". So I went looking and I have to say he is both right and wrong. 

To begin with, I added a list of State names to the database I'm using for GeoSpatial testing and then kicked out three queries:

 

DECLARE @stateCode char(2)
SELECT @stateCode =StateCode FROM StateList WHERE StateName='Ohio'
SELECT * FROM GeoSpatialTesting WHERE StateCode = @stateCode 
SELECT gs.* FROM GeoSpatialTesting gs INNER JOIN StateList sl
ON gs.StateCode = sl.StateCode WHERE sl.StateName='Ohio'

The first two mimic the third one, basically looking up the StateCode from StateList that matches and then using that to get the data out of the database. And, in this instance, Jeff is correct. If you run these with an execution plan turned on, you'll find that the first two statements take 36% of the execution cost and the INNER JOIN takes 64%. But this is for an absurdly simple case of INNER JOIN, where we are using the JOIN as a FILTER of a single bit of information. In this case, it makes much more sense to get the filter out of StateList and use it to filter the data from GeoSpatialTesting. The cost of the JOIN is in the Hash Match of the two tables.

 

With that information then I would agree that, when you are using a simple filter of data, it would be a best practice to get the value and then filter in your code. But let's look at a slightly different version. I've added a Statistical Weight column to the StateList. For the importance of this discussion, let's look the three states weighted highest (by random numbering):

 

StateCode StateName StatisticalWeight
MO Missouri 97
KY Kentucky 98
SC South Carolina 98

 

Now, if we wanted the data where the StatiscalWeight was 97, that we could easily mimic with our earlier code:

DECLARE @stateCode char(2)
SELECT @stateCode =StateCode FROM StateList WHERE StatisticalWeight =97
SELECT * FROM GeoSpatialTesting WHERE StateCode = @stateCode 
SELECT gs.* FROM GeoSpatialTesting gs INNER JOIN StateList sl
ON gs.StateCode = sl.StateCode WHERE sl.StatisticalWeight =97

Again, we get the same distribution of cost. But, when you change the StatisticalWeight to 98, you get a different thing. The Query Cost remains the same, but the result sets are vastly different. The first result set contains KY only, because the variable @StateCode is set to just one value with 342 rows. The second result set, which contains both KY and SC, contains 813 rows. And it costs the same amount as the JOIN for a weight of 97.

So, yes, JOINS are more expensive than filtering data when using as a one value filter. But, when used for multi-value filters they are very cost efficient.

In addition, if you need to have additional data, they are very efficient workhorses, enabling normalization to make data cleaner.

But that's a tale for another day.

Tags:

Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
Enter the code shown above in the box below
Add Comment  Cancel 
 
 
  
 
Privacy Statement | Terms Of Use Copyright 2001-2008 by ReluctantDBA.com