Register
Friday, September 03, 2010
 
 DBAs And ProgrammersBlog
  
News! Minimize
   
 
 Print   
 
Misc Blog Stuff Minimize
   
 
 Print   
 
The Reluctant DBA Minimize
 
 
 
 Print   
 
Reluctant DBA Minimize
   
 
  
 
Reluctant DBA Minimize
   
 
  
 
The Reluctant DBA Minimize
 
Nov20

Written by:CarpDeus
11/20/2008 6:28 PM 

So, I'm playing around with SQL Data Services (SDS), one of Microsoft's Cloud Databases. (see more here, signup for Public CTP here). And it requires some shifts from the way that things are normally done. I've got a demo that I talk about here, but I've made some changes since then. The original demo only had one "table", a calendar of events. I've updated that Zip file to now include another "table".

If you're wondering why I put "table" in quotes, it's because SDS doesn't use tables, it uses... No, let's begin at the beginning.

SDS starts with Authorities, which are like servers. Each Authority can have multiple Containers, which are like databases. And each Container holds Entities, which is roughly like a table. Now, I've been joking for years that we waste too much time on database design and we should just create one table with an XML block and store everything in there. That's not exactly what SDS is doing, but it helps to start thinking that way to grok how to use the data.

And the first change you need to make to your mindset is to remove Normal Forms. When I first wrote this demo, I was trying to use a basic, normalized structure. I had two tables: CalendarEntry and WebSite. The integer Identity column in WebSiteID was used form a foreign key dependency for CalendarEntry. That would work in SDS as long as you're not using integers for your Ids, because you can't have two entities that share the same key. So I had to denormalize and store the WebSiteID in both the CalendarEntry and the WebSite and not use WebSiteID as the PrimaryKey in the WebSite entity/table. But that's not really that big a stretch. Anyone who has ever heard Kimberly Tripp speak about SQL Server knows about how your Clustered Index should be on a monotonically increasing value. Which means that you might actually use a UUID for your primary key to uniquely identify the data but the clustered index would be on an identity column that you never actually use.

That's how to think about the Id column in SDS. Make it whatever you'd like, it doesn't really matter. And use the column you want to as the key. At the moment, this is how the two Entities in my sample look in SXML (SDS XML, which isn't real XML but that's another story... try opening the XML file in the zip with IE and listen to it complain until you put a top level element in). Here is the Calendar entity:

http://schemas.microsoft.com/sitka/2008/03/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:x="http://www.w3.org/2001/XMLSchema">
  1
  2008-01-01T00:00:00
  New Years Day
  New Years Day
  W1

And here is the WebSite entity:

http://schemas.microsoft.com/sitka/2008/03/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:x="http://www.w3.org/2001/XMLSchema">
  70E6B8E4-6A0C-4BF9-8542-5C01CE673FC3
  Future Technology
  Gravid.ReluctantDBA.com
  W8

If you've created an SDS account and loaded the data from the demo, you can run some queries against it. I highly recommend using the SDS SDK tool to load the data and Mike Amundsen's Provisioning Client for all of the other things you want to do, like run queries against it. Let's start with two queries to confirm the data was loaded. First, to verify that you have WebSite entities loaded:

from e in entities where e.Kind=="WebSite" select e

Next, to very that you have CalendarEntry entities loaded:

from e in entities where e.Kind=="CalendarEntry" select e

So, now that we have data, let's get a list of dates in 2009 for each web site. The traditional method would be to use a JOIN, so let's start with that (SDS Query Language Grammar):

from c in entities
where c.Kind == "CalendarEntry"
from w in entities
where w.Kind == "WebSite"
where c["WebSiteID"] ==w["WebSiteID"] &&
(w["WebSiteID"] == "W1" || w["WebSiteID"]=="W2")
&& (c["EventDate"] > DateTime("2009-01-01T00:00:00") &&  c["EventDate"] < DateTime("2010-01-01T00:00:00"))

Now, SDS uses a LINQ like query language, but you when you join you can only select one entity. So, while this does return the CalendarEntry entities we are looking for, it doesn't have any of the WebSite data. Which makes it a bit more problematic. But, since we have included the WebSiteID in both entities, we can do a select without the JOIN and come closer to what we want:

from c in entities
where (c["WebSiteID"] == "W1" || c["WebSiteID"]=="W2")
orderby c["WebSiteID"]
select c

This gives us all of the correct entities, but it includes all of the calendar entries, not just the ones for the year we are interested in. Oh, pay attention to that OrderBy, it will be important tomorrow. It makes logical sense, coming from the database world, to then attempt to modify the query as follows:

from c in entities
where (c["WebSiteID"] == "W1" || c["WebSiteID"]=="W2")
&& (c["EventDate"] > DateTime("2009-01-01T00:00:00") &&  c["EventDate"] < DateTime("2010-01-01T00:00:00"))
orderby c["WebSiteID"]
select c

Unfortunately, the additional filter removed the WebSite entities. Why? Because they don't have an EventDate property. However, all is not lost (though I'm grateful that I have a guide who's been through this process to help me out). What we really need to do is add a bit more filtering and remember that parentheses count. Here is the query and then we'll walk through it:

from c in entities
where (c["WebSiteID"] == "W1" || c["WebSiteID"]=="W2")
&& (((c["EventDate"] > DateTime("2009-01-01T00:00:00") &&  c["EventDate"] < DateTime("2010-01-01T00:00:00") ) && c.Kind=="CalendarEntry")
|| c.Kind=="WebSite")
orderby c["WebSiteID"]
select c

Ok, the first WHERE section filters both entities by WebSiteID property. Then we have a parenthetical fest which translates into:

  1. EventDate must be greater than 1/1/2009 and EventDate must be less than 1/1/20010 and the Kind of entity is CalendarEntry,
  2. OR Kind of entity is WebSite.

Now, that may seem complex but really the only addition is the fact that TSQL wouldn't require us to explicitly state that a table that doesn't contain the column shouldn't be excluded from the query. Well, that and the fact that we still end up with something that looks like this (whole file here):

http://schemas.microsoft.com/sitka/2008/03/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:x="http://www.w3.org/2001/XMLSchema">

  04AF565A-157E-4679-BE90-C15AED3D01DF
  364612
  Happy Blues
  HappyBlues.ReluctantDBA.com
  W1
 


  19
  361286
  2009-02-02T00:00:00
  Groundhog Day
  Groundhog Day
  W1
 

It's not indented because CalendarEntry isn't actually tied to the WebSite. But that's ok. We'll talk about things you can do with the raw data either tomorrow or over the weekend.

For now, enjoy!

Tags:

1 comment(s) so far...

Re: The End of JOINs?

Thanks for this informative post. it's well written and useful. This is a nice blog and will be looking forward to read more from you. I found your blog on Google and read a few of your other posts. I just added you to my Google Reader. Keep up the good work!

By Torrents Search on  2/11/2010 10:08 AM

Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
Enter the code shown above in the box below
Add Comment  Cancel 
 
 
  
 
Privacy Statement | Terms Of Use Copyright 2001-2008 by ReluctantDBA.com