Register
Thursday, September 09, 2010
 
 DBAs And ProgrammersBlog
  
News! Minimize
   
 
 Print   
 
Misc Blog Stuff Minimize
   
 
 Print   
 
The Reluctant DBA Minimize
 
 
 
 Print   
 
Reluctant DBA Minimize
   
 
  
 
Reluctant DBA Minimize
   
 
  
 
The Reluctant DBA Minimize
 
Mar11

Written by:CarpDeus
3/11/2009 9:04 AM 

Ok, maybe it's not exactly a myth, but have you ever thought of why transactions handled by the server are necessary? We have had drilled into us from our very first database lessons that ACID is a requirement. And it's very possible that you've probably seen demonstrations where they talk about having two accounts and moving money from one account to another and how that should be wrapped in a transaction to prevent the loss of data between the two. But there's a problem with that; well, two problems with that.

The first has to do with scalability. When you use BEGIN TRANSACTION, you're really creating a snapshot of the database as it exists before you execute whatever you're going to do. That way you can issue ROLLBACK and have the database reset to where it was before you began, sort of a no harm, no foul method. But, for the duration of your transaction, you are blocking other processes from writing to the database until you either COMMIT or ROLLBACK. And if any of those processes are wrapped in a transaction you can quickly get a logjam that leads to deadlocks and unhappy clients and other problems. And this leads to the title of this post.

SQL Server implements transactions on every request you make where you don't specify a transaction. Take the following example:

UPDATE Employee SET LastName = 'Smith' WHERE EmployeeID = 1

That is functionally equivalent to:

BEGIN TRANSACTION
UPDATE Employee SET LastName = 'Smith' WHERE EmployeeID = 1
COMMIT

Where we encounter the myth of transaction necessity is when we attempt to create a block of SQL Statements within a transaction. My favorite example (i.e., one I used as a demo to explain transactions) is the simple transfer of funds that I used in the introduction of this post, the gist of which can be summed up as:

  • Transferring money shouldn’t be too hard; you just remove $25 from savings and put it into checking.  But what if…
    • The money is removed from savings but, before added into checking, the server goes down. Result: savings has $75, but checking only has $100.
    • If $25 is added into the checking account before being removed from the savings account and the server goes down then the checking account has $125, but the savings account still has $100.
    • We can't remove $25 for some reason? Where do we stop the process?
  • Solution? Make the two transactions one!
  • Once a transaction begins, one of two things needs to happen:
    • If the transaction finishes without running into any problems, then it needs to be committed.
    • If the transaction encounters an error that prevents it from finishing, it needs to be rolled back and the entire transaction is erased, just as though it never happened.

In the traditional, single database model, that's exactly how transactions work and it fails miserably for many reasons. While you're busy processing the transaction, as I said above, the system has to wait for you to conclude the transaction, which makes transactions unfriendly toward scaling. More importantly, this doesn't work if you need to step outside of this server in any way.

What if I am transferring money to another account in another bank? Most banks process that over the course of 72 hours, and you can't keep a transaction open that long. Not only that, but what if, for whatever reason, that process fails?

At this point, it's time to do away with Transactions for blocks of SQL Statements and to change our mode of thinking. Rather than committing two or more commands wrapped within a single transaction and forcing SQL Server to maintain state for us, we need to design our applications to be able to handle this ability to keep and reset state for ourselves. The key to doing this so it doesn't drive us crazy is to realize that we don't need the ability to roll back an entire database, just a part of it. So, in the example of transferring money to an account in another bank:

  • Create a transaction type and ID
  • Withdraw the money from our account
  • Send deposit to other bank

At some point the other bank should send us a status with our transaction id as a reference id and tell us that it either completed successfully (where we just mark the transaction as completed) or that it failed for some reason. In that case, we just redeposit the money originally withdrawn, perhaps with a note to say why it was deposited (failed transfer). This differs from an internal SQL Transaction in that SQL Transactions make it appear as though nothing has happened on a rollback while the distributed transaction rollback will show both the withdrawal and the redeposit of the withdrawal.

And that scales because our SQL transaction lasted only for the duration of the updates to the account. Our transaction, regardless of how long it takes before we call it complete or roll it back, has no blocking effect on the database.

It's more work for us because we have to design, implement, test and debug our own transactional system and some people might say that it's reinventing the wheel because SQL Server already does this but it isn't, it's far more like taking the basic design of wheels for a car and redesigning them for use on the moon. The basic principle is the same, but the requirements are vastly different. And anyone who wants to build scalable, cloud based applications is going to need to start looking at changing how they think about transactions.

Copyright ©2009 Carp Deus

Tags:

Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
Enter the code shown above in the box below
Add Comment  Cancel 
 
 
  
 
Privacy Statement | Terms Of Use Copyright 2001-2008 by ReluctantDBA.com