Today we're continuing our look at Microsoft's Cloud Computing initiative by looking at Azure Table Storage. In previous posts I covered an Introduction to Microsoft Azure and the way that Azure Storage handles Security. Today I'm going to look at the next step, Azure Table storage. If you want to follow along, download a copy of my AzureCommand class. You also might want to create an Microsoft Azure Account and load in some data. I should state that I am not looking at the locally hosted development storage, only at the cloud hosted one.
As Microsoft uses a REST-ful API, I'll be talking about what HTTP Method needs to be called; what special headers, if any, need to be included, and what you can expect to get back. If you have an account with data, you can follow along at my Azure Table Storage Web Interface, where you can enter your own account information and see what happens.
Also, before we dig too deeply, ATS uses an Atom format. If you can read XML, you should be able to follow the data. You may want to brush up on XML with Namespaces but the code will cover everything we need.
Azure Table Storage (ATS)
Azure Table Storage (ATS) is a slightly misnamed tool. It's really Microsoft's take on a schemaless database. Allf of the activity for your account takes place at http://{account}.table.core.windows.net/ and the first thing we are going to look at is getting a list of tables. To do this, just execute a GET against http://{0}.table.core.windows.net/Tables (CanonicalUrl is /{account}/Tables), a successful call will return 200 (OK) and you'll get an Atom block back that looks like this:
xml version="1.0" encoding="utf-8" standalone="yes"?> <feed xml:base="http://{account}.table.core.windows.net/" xmlns:d=http://schemas.microsoft.com/ado/2007/08/dataservices
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/Atom"><title type="text">Tablestitle><id>http://{account}.table.core.windows.net/Tablesid><updated>2009-06-01T12:29:50Zupdated><link rel="self" title="Tables" href="Tables" /><entry><id>http://{account}.table.core.windows.net/Tables('demonstrations')id><title type="text">title><updated>2009-06-01T12:29:50Zupdated><author><name />author><link rel="edit" title="Tables" href="Tables('demonstrations')" /><category term="{account}.Tables" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /><content type="application/xml"><m:properties><d:TableName>demonstrationsd:TableName>m:properties>content>entry> feed>As you can see, we get a list of elements that contain an ID (the url), when it was updated, a link you can use to edit it (sort of) and, finally, the properties block with a tablename. If you've downloaded the sample code, then you can look at the WinForm app and see that I've got a ComboBox that I'm using to store table lists in. I make a GET call to http://{0}.table.core.windows.net/Tables and then process the result set with the following code:
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(ar.Body);//Instantiate an XmlNamespaceManager object.
System.Xml.XmlNamespaceManager xmlnsManager = new System.Xml.XmlNamespaceManager(xdoc.NameTable);//Add the namespaces used in books.xml to the XmlNamespaceManager.
xmlnsManager.AddNamespace("d", "http://schemas.microsoft.com/ado/2007/08/dataservices");
xmlnsManager.AddNamespace("m", "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata");
XmlNodeList nodes = xdoc.SelectNodes("//d:TableName", xmlnsManager);foreach (XmlNode node in nodes)
{
cbTables.Items.Add(node.InnerText);
}While it is fairly self explanatory if you know anything about XmlDocument, most people don't have to deal with Namespaces on a daily basis so I'll touch briefly on those. XML Namespaces are used to uniquely define elements and attributes within XML. Anytime you see XML with a ":" (colon) in it, it has namespaces and they must be defined at the top of the document. If you want to programmatically manipulate the data in .NET, you'll need to define a namespace manager. From that point on, it's fairly straightforward.
With that little digression out of the way, let's talk about...
Tables
This is what puts the T in ATS. Unfortunately, too many people think of SQL like objects when they hear the word table, with well-defined data objects. ATS, however, is schemaless. Microsoft's own description ATS states:
The Table service does not enforce any schema. A developer may choose to implement and enforce a schema on the client side. (http://msdn.microsoft.com/en-us/library/dd573356.aspx)
This is where major differences between ATS and a Relational Database Management System (RDBMS) start to show. In a traditional RDBMS like SQL Server, there is a lot of data validation that occurs on the server side, that's the Management part. ATS does very little management and almost no validation of data (more on this when we talk about Entities). On the plus side, it means you can easily add a new column just by applying it on every new insert/update. On the negative side, it does mean that you won't automatically get a NULL value back, rather you'll get a missing element which can throw errors. But we'll talk more about that when we discuss entities as well. For now, back to the misnamed "tables", which are really URIs used to access the data.
Table names have some minor requirements (table name Regex "^[A-Za-z][A-Za-z0-9]*"):
- Table names may contain only alphanumeric characters.
- A table name may not begin with a numeric character.
- Table names are case-insensitive.
- Table names must be from 3 through 63 characters long.
There are three things you can do to tables: Create a new table (POST), delete an existing table (DELETE) and query a table (GET).
Create a new table
Creating a new table is simple. Execute a POST against http://{account}.table.core.windows.net/Tables (CanonicalUrl is /{0}/Tables)with a properly formatted HttpRequestBody containing the name of the table. I have no idea why MS is using Atom, I don't see what it adds (especially since MS throws away title and author elements) but below is the string constant that the class uses to create a properly formatted body to send. It uses a date and table name to fill in the two placeholders.
xml version=""1.0"" encoding=""utf-8"" standalone=""yes""?> <entryxmlns:d=""http://schemas.microsoft.com/ado/2007/08/dataservices""xmlns:m=""http://schemas.microsoft.com/ado/2007/08/dataservices/metadata""xmlns=""http://www.w3.org/2005/Atom""><title /><updated>{0:yyyy-MM-ddTHH:mm:ss.fffffffZ}updated><author><name />author><id /><content type=""application/xml""><m:properties><d:TableName>{1}d:TableName>m:properties>content> entry>"A successful table creation will return a 201 (Created) and will include the Rel Link for editing the table, which is strange because you never actually use this link for editing, merely deleting.
Delete an Existing Table
Deleting an existing table is almost as easy as creating one. Execute a DELETE against http://{account}.table.core.windows.net/Tables('{tablename}') (CanonicalUrl is /{accountname}/Tables('{tablename}')). If successful, you'll get a 204 (NoContent). You could also get a 404 (NotFound), which indicates the table doesn't exist to begin with.
Getting an Existing Table
Executing a GET against a table returns all of the Entities in the table. Use the URL http://{account}.table.core.windows.net/{tablename}() (CanonicalUrl is /{accountname}/{tablename()). This will return an Atom block containing Entities:
<entry m:etag="W/"datetime'2009-04-13T22%3A24%3A44.8491603Z'""><id>http://{account}.table.core.windows.net/{tablename}(PartitionKey='CalendarEntry',RowKey='1')id><title type="text">title><updated>2009-06-01T16:14:39Zupdated><author><name />author><link rel="edit" title="demonstrations" href="{tablename}(PartitionKey='CalendarEntry',RowKey='1')" /><category term="{account}.demonstrations" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /><content type="application/xml"><m:properties><d:PartitionKey>CalendarEntryd:PartitionKey><d:RowKey>1d:RowKey><d:Timestamp m:type="Edm.DateTime">2009-04-13T22:24:44.8491603Zd:Timestamp><d:eventdescription>New Years Dayd:eventdescription><d:eventdate m:type="Edm.DateTime">2008-01-01T00:00:00Zd:eventdate><d:eventdetails>d:eventdetails><d:websiteid>W1d:websiteid>m:properties>content>entry>In the above example, you can see the Edit gives you some useful informaiton, including the id (which is a link to this entry), an etag that can be used as the HTTP Etag, and the full Entity. This is another part of the equation that RDBMS does that ATS doesn't. You get all of the information the Entity contains, you can't select a subset of the "columns".
And that wraps up basic Table functionality, so now let's turn our attention to...
Entities/Properties
Each table in ATS is really a URI where you can access and modify data. Each chunk of data is called an Entity and is made up of Properties. An Entity is basically XML that is addressable by a PartitionKey and a RowKey. The PartitionKey is a means by which Microsoft determines how best to load balance data across nodes. RowKeys are unique within a PartitionKey. Given that, it can be handy to think of the PartitionKey as a traditional table name and the RowKey as the PrimaryKey, as they serve similar functions. Since the Entity is really nothing more than a bucket for storing XML, we'll dig right into properties, starting with the required ones.
All entities have at least three properties. The PartitionKey, the RowKey and a TimeStamp. If you look above, you'll find those are the first three properties in the example listed under Getting an Existing Table. The Timestamp property cannot be changed and is, in fact, ignored if you attempt to send it in an update. The PartitionKey and RowKey are both strings, can can be up to 1KB long. Of course, being strings can complicate things a bit. If you want to use an old-fashioned monotonically increasing value for your RowKey, you'll need to do some changing around before it's useful. If you just stick in the number you'll quickly find that numeric strings sort {1,10,100,101,102,103,104,105,106,107,108,109,11,110}, which isn't very useful. So you'll need to zero pad the strings on the left to use it for sorting.
Beyond these three properties, you can have up to 252 more. Not only that, but the Namespace associated with the entity means you can actually define the type of the property contains. You can find the complete list of supported property types here, but let's take a minute to talk about some of them.
First, if you submit a property with a specific data type and the data doesn't match the data type, you'll get a 400 (BadRequest). The following example will throw such an error because we're attempting to pass a string in with a definition of Boolean. Boolean can be true, false, 0 (false) or 1 (true). The string versions are case sensitive, False will throw a 400.
<m:properties><d:PartitionKey>CalendarEntryd:PartitionKey><d:RowKey>1d:RowKey><d:newdatam:type="Edm.Boolean">Josefd:newdata>m:properties>
If you don't specify anything, the property defaults to being a string. And case counts, on both the property name and the value. If you have a property named eventdescription containing the value "Christmas Bank Holiday" then the only filter that will return that information is "eventdescription eq 'Christmas Bank Holiday'". "Eventdescription eq 'Christmas Bank Holiday'" will fail because there are no properties matching that case-specific spelling. "eventdescription eq 'christmas Bank Holiday'" will value because the values don't match. What this really means to developers is that we need to pay careful attention to detail and enforcing data validation because ATS will allow three different Entities for the same partition key, one with "eventdescription", one with "EventDescrption" and one with "eventDescription" without batting an eye. Even if all three contain the value "Christmas Bank Holiday" a query will never return all three Entities.
So, having said that, let's look at an example of an entry.
<entry m:etag="W/"datetime'2009-04-13T22%3A24%3A44.8491603Z'""><id>http://{account}.table.core.windows.net/{tablename}(PartitionKey='CalendarEntry',RowKey='1')id><title type="text">title><updated>2009-06-01T16:14:39Zupdated><author><name />author><link rel="edit" title="demonstrations" href="{tablename}(PartitionKey='CalendarEntry',RowKey='1')" /><category term="{account}.demonstrations" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /><content type="application/xml"><m:properties><d:PartitionKey>CalendarEntryd:PartitionKey><d:RowKey>1d:RowKey><d:Timestamp m:type="Edm.DateTime">2009-04-13T22:24:44.8491603Zd:Timestamp><d:eventdescription>New Years Dayd:eventdescription><d:eventdate m:type="Edm.DateTime">2008-01-01T00:00:00Zd:eventdate><d:eventdetails>d:eventdetails><d:websiteid>W1d:websiteid>m:properties>content>entry>If you think that looks familiar, it's because it's the same data that came back when we did a GET on the table. ATS is simply returning to us what we stored. But we'll use the above data as the starting point for all of our examples, but first, let's talk about what's important in that data.
Azure actually wants a block that looks very much like the one above, but the important information is what I've emphasized, the parts in the element. The class I wrote actually has a string format that takes some parameters, including the bold section above, and creates a properly formatted body. That string is:
xml version=""1.0"" encoding=""utf-8"" standalone=""yes""?> <entry xml:base=""http://mamund.table.core.windows.net/""xmlns:d=""http://schemas.microsoft.com/ado/2007/08/dataservices""xmlns:m=""http://schemas.microsoft.com/ado/2007/08/dataservices/metadata""m:etag=""{etag}""xmlns=""http://www.w3.org/2005/Atom""><id>http://{account}.table.core.windows.net/{tablename}(PartitionKey='{partitionkey}',RowKey='{rowkey}')id><title type=""text"">title><updated>{requestedate:yyyy-MM-ddTHH:mm:ss.fffffffZ}updated><author><name />author><link rel=""edit""href=""{tablename}(PartitionKey='{partitionkey}',RowKey='{rowkey}')"" /><category term=""{account}.{tablename}"" scheme=""http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"" /><content type=""application/xml""> {data}content> entry>We've talked about most of these replacement placeholders before, the only really new one is {data}, which contains the XML of that defines the entity.
Creating an Entity
Creating an Entity is almost as easy as creating a Table. It requires using a POST against http://{account}.table.core.windows.net/{tablename} (CanonicalUri is /{account}/{tablename}) and a properly formatted Atom body (see above). One of the interesting things is that the id that we need to define includes the PartitionKey and RowKey but the Uri we are posting does not. And, if the PartitionKey and RowKey in the {data} replacement token don't match the ones in the ID, you'll receive a 409 (Conflict). If you are successful in creating a new entity, you'll get 201 (Created) back along with a copy of the submitted data, complete with a proper e-tag.
Merging an Entity
Merging an entity uses a non-standard HTTP Command, MERGE. What this command does is add any properties that you send that don't already exist and update any properties that match. So, if we take the CalendarEntry with a RowKey of 1 that I displayed up above and send the following information in the body using the MERGE verb against http://{account}.table.core.windows.net/{tablename}(PartitionKey={partitionkey}',RowKey='{rowkey}') (CanonicalUri= /{account}/{tablename}(PartitionKey={partitionkey}',RowKey='{rowkey}')), what would you expect to happen?
<m:properties><d:PartitionKey>CalendarEntryd:PartitionKey><d:RowKey>1d:RowKey><d:eventdescription>New Years Day Las Vegas!d:eventdescription><d:newdata>What happens in Vegas...d:newdata>m:properties>
If it was successful, you'll get back a 204 (NoContent). And, if you said that the next query of that data would get back data a version where eventdescription has changed and includes a new property named newdata, you're correct.
<m:properties><d:PartitionKey>CalendarEntryd:PartitionKey><d:RowKey>1d:RowKey><d:eventdate m:type="Edm.DateTime">2008-01-01T00:00:00Zd:eventdate><d:eventdescription>New Years Day Las Vegas!d:eventdescription><d:eventdetails>d:eventdetails><d:newdata>What happens in Vegas...d:newdata><d:websiteid>W1d:websiteid>m:properties>
Updating an Entity
The difference between updating and merging an entity is that update replaces all of the data. If you choose to use the PUT method against http://{account}.table.core.windows.net/{tablename}(PartitionKey={partitionkey}',RowKey='{rowkey}') (CanonicalUri= /{account}/{tablename}(PartitionKey={partitionkey}',RowKey='{rowkey}')) using the same data sent in the MERGE, you'll still get a 204 (NoContent) but the next time you query the data you'd find that your data contains only 2 properties: eventdescription and newdata:
<m:properties><d:PartitionKey>CalendarEntryd:PartitionKey><d:RowKey>1d:RowKey><d:eventdescription>New Years Day Las Vegas!d:eventdescription><d:newdata>What happens in Vegas...d:newdata>m:properties>
Deleting an Entity
Deleting an entity is simple. It executes a DELETE against http://{account}.table.core.windows.net/{tablename}(PartitionKey={partitionkey}',RowKey='{rowkey}') (CanonicalUri= /{account}/{tablename}(PartitionKey={partitionkey}',RowKey='{rowkey}')) with no body sent and also returns a 204 (NoContent).
There are really several items left to talk about but today's entry is long enough. Come back tomorrow and we'll wrap the basics of Azure Table Storage talking about eTags and Query and some more thoughts on PartitionKeys and RowKeys.
All Posts in Series: