Register
Tuesday, March 16, 2010
 
 DBAs And ProgrammersBlog
  
News! Minimize
   
 
 Print   
 
Misc Blog Stuff Minimize
   
 
 Print   
 
The Reluctant DBA Minimize
 
 
 
 Print   
 
Reluctant DBA Minimize
   
 
  
 
Reluctant DBA Minimize
   
 
  
 
The Reluctant DBA Minimize
 
Jun10

Written by:CarpDeus
6/10/2009 6:28 PM 

Update: Aaron responded to my post (please see comments below) so I've made a minor edit to the post.

At this morning's first Azure talk at VSLive Las Vegas, I thought I heard Aaron Skonnard state that Azure Table Storage was irrelevant, to be replaced by SQL Data Services (his views are in the comments). I have to say, that probably didn't really help with attendance at my after lunch talk on ATS, but it did get me thinking even more about why ATS is important and why SQL Server has scalability issues.

Let me state, first and for the record, SQL Server scales. There's a limit to how much it can scale before you need to add clustering (which is a whole different can of worms) but SQL Server scalability is all related to cost, throwing more and faster hardware at the problem. The number of transactions you can process in a second, the number of disk reads, etc. SQL Server can handle a lot of transactions if it is properly designed. But SQL Server scales through hardware. Azure Table Storage scales through simplicity, meaning you can do more transactions on with ATS than SQL Server on equivalent hardware.

All or most of the management that's a part of the SQL Server RDBMS is done away with. Even if they are using SQL Server on the back end, they are still using a lot less of the management that accompanies a traditional SQL Server architecture.

Consider, if you will, the true cost of an insert into a table in a well-normalized database. Insert a new row of data into a well-normalized database and you have executed a transaction. You only need to use BEGIN TRANS when you're executing a set of SQL Statements. Even without BEGIN TRANS, SQL Server treats that single INSERT as a transaction that begins when you execute the insert and ends when SQL Server has finished. Have you ever considered what goes into that single transaction? If we dig beneath the covers, you'll find something like this:

  • Verify the Primary Key is unique
  • Check the physical page where you need to insert the row and insert the data
    • If there isn't enough room on the page, allocate a new page, possibly splitting the data across the old and new pages
  • For each index associated with the table, add a new entry in the appropriate location
    • If there isn't enough room on the page where the index entry is going to go, allocate a new page, possibly splitting data across the old and new pages
  • For each foreign key relationship defined for the table, verify that the foreign key information is correct

At any stage in this process, for any point of failure, SQL Server needs to ROLLBACK the transaction. According to Microsoft, this means:

ROLLBACK TRANSACTION erases all data modifications made from the start of the transaction or to a savepoint. It also frees resources held by the transaction.

In other words, erase the insert as though it never happened. Which really means that you can't do anything with the table that I'm inserting to (and possibly some tables that I have foreign key relationships with) until SQL Server either commits or rolls back the transaction. Which is to say, SQL Server blocks other processes. Now, SQL Server can do all of the above steps blazingly quickly and DBAs and SQL Developers know a lot of tricks for tweaking the last bit of work out of SQL Server. But that's indicative of why SQL Server, which may work fine for 80% of the cases, can have scalability issues. Sure, it will scale, but the cost in hardware and extremely talented consultants and DBAs who have to monitor the server cluster like a mother hen brooding over her eggs can be exorbitant.

Azure Table Storage, on the other hand, is simpler. It doesn't have the management overhead. It puts data in, updates data, deletes data, merges data. But it doesn't index, it doesn't validate, it doesn't handle many of the things that SQL Server does so well for us. Not doing that, however, allows it to scale so that the same hardware does more. If I can get more out of less hardware, there's a cost component involved. Granted, there is a cost component for development that's not required for SQL Server, but I suspect that there will be a set of classes that folks develop to handle that as best practices are discovered.

I don't think ATS will replace SQL Server, but I also don't think SQL Data Services will replace the need for ATS. I think both have their place in the proper situations and the near future will help determine which is appropriate for different scenarios.

Copyright ©2009 Carp Deus

Tags:

2 comment(s) so far...

Re: Why Aaron Skonnard is Wrong About Azure Table Storage

You are misquoting me. I didn't say "ATS was being replaced by SDS". I said I believe most developers will probably choose to use SQL Data Services over ATS given the simplicity of its relational model. I'm pretty sure I also said that ATS is still applicable in scenarios that require huge degrees of scalability. Given the various pros and cons, I'm simply of the opinion that *most* developers will opt for the simplicity of the SDS relational model especially since 90-95% of SQL Server databases aren't bigger than a few GBs in size.

By Aaron Skonnard on  6/12/2009 3:56 PM

Re: Why Aaron Skonnard is Wrong About Azure Table Storage

Hey there Brent/Aaron - I see an interesting thing to chat about:

Azure Table Storage, on the other hand, is simpler. It doesn't have the management overhead. It puts data in, updates data, deletes data, merges data. But it doesn't index, it doesn't validate, it doesn't handle many of the things that SQL Server does so well for us. Not doing that, however, allows it to scale so that the same hardware does more.

OK, but I really need to see the internals of this... keys mean data integrity - does Azure not have anything that manages uniqueness? And, indexes are often what provides scalability to data - so, without those, how is ATS doing a lookup. There's got to be SOMETHING under the hood. I wanna see the engine! :)

Having said all of that - there's a time/place for smaller, simplier databases and when the DBs are measured in single-digit GB (not 100s of GB or TB...) then this might be an interesting solution. Can't directly address that because I still think that a minimal amount of work in understanding SQL Server (even Express) is often worth the long term payoffs. If you want to scale, you easily have a path to do it. If you go to ATS and then your DB starts hitting 10-20GB - what happens? So, this is definitely a GREAT discussion! Maybe we need to find a common city - and then find a pub. :)

Cheers you two!
kt

By Kimberly L. Tripp on  6/21/2009 8:27 PM

Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Add Comment  Cancel 
 
 
  
 
Privacy Statement | Terms Of Use Copyright 2001-2008 by ReluctantDBA.com