An introduction into Umbraco Hive and creating a Hive provider which stores media on Azure blob storage.
Jupiter is just around the corner
With its first beta release on November 11th the road to Umbraco 5 (aka Jupiter) has brought us very close to its destination. The members of the Umbraco core team have been working really hard on stabilizing the API and many of our beloved (and improved) CMS features are well taking shape.
As Umbraco 5 remains open source all current and future releases are made available through Codeplex which allows developers of the Umbraco community to explore and work with the source code, to contribute and to provide the core team with valuable feedback.
After having attended the awesome Umbraco v5 hackathon organized by BUUG and hosted at our offices in Amsterdam – and also very much inspired by Alex Norcliffe’s talk at the Microsoft offices in Brussels – we really wanted to get our hands dirty and – deep dive into hive!
Many of the recent Umbraco 5 talks and presentations involve Hive, a completely new and extremely powerful concept in the upcoming version. Much unlike a traditional data layer in most n-tier or n-layer (web) applications Hive represents an abstraction layer for developers to plug in multiple, stackable data providers. These Hive providers can pull data (read and/or write) from nearly any data source and allow for smooth integration within the Umbraco back office as well as a transparent way of querying the data in the front-end.
The first beta release comes with three Hive providers, created by the core team:
- A persistence provider that already supports SQL server and SQL CE (which uses NHibernate as ORM)
- An Examine provider for data indexation (which uses the Lucene.NET indexer)
- An IO provider which stores data on the local file system
Basically this means Hive already supports the vast majority of (.NET) based websites.
By the way, didn’t I mention these providers are stackable? In a recent demo (video currently not available yet) Alex quickly swapped the Examine provider with the persistence provider, having Umbraco run entirely on a file index without using a database at all. Although that’s a neat trick by itself, the true power of Hive is the ability to query multiple, stacked providers: imagine having data stored in both a database and a file index while data is always retrieved from the file index in order to gain a performance boost… Hive will make it work.
In many countries the ongoing v5 hackathons allow Umbraco developers to come together, share experiences, learn and contribute. These sessions are the perfect opportunity to experiment with the various new concepts in v5 such as hive, hive providers, property editors, surface controllers, tree controllers and much more.
During the BUUG hackathon we were able to play around with the WordPress Hive provider created by the Karminator. Basically the provider connects to a WordPress blog and fetches categories and posts which are then loaded into the Umbraco back office. And even though our efforts of making the provider writeable were somewhat fruitless (#h5is) it did allow us to get familiar with Hive and understand how to create and configure a custom Hive provider.
Since we weren’t able to attend the Umbraco UK festival and we didn’t want to wait until the next hackathon to further explore v5, we started a somewhat experimental side project: whether or not the websites we create are hosted on Windows Azure we often want to store resources on Windows Azure CDN or blob storage. In Umbraco v5 style sheets, scripts, templates and media uploads are stored on the local file system. All of this is handled by the IO Hive provider, so we decided to create a Hive provider which is to save these files on blob storage, and it works!
As for the more technical part of this post, let’s have a closer look at the setup.
Configuring the Blob Storage Hive provider
When you have a closer look at the Hive configuration file you will notice that a Hive provider can be configured separately for each of the storage types known by Umbraco.
The configuration file allows you to configure the type of the Factory class that is used by Hive to instantiate the provider’s entity repository (more on this later) as well as a reference to the provider configuration section which is located in the web.config. As shown below this configuration section allows users to further configure the Hive provider.
As for the blob storage provider we allow users to configure the name of the connection string of the blob storage account (which is to be added to the connection strings section) as well as the name of the blob container.
Creating a custom configuration section is fully supported by the Umbraco v5 Framework and is just a matter of creating/inheriting some classes that will allow the settings to be injected into the Entity repository (handled by IOC, very sweet). Make sure to have a look at the source code to find out exactly how easy it is ;-)
Creating the Blob Storage Hive provider
Although we haven’t gotten around testing the provider for all storage types just yet, obviously the first thing we had in mind was having the provider handle media file uploads. The one thing which distinguishes media from other storage types is is that it uses both a persistence provider and a storage provider. This approach provides a (much needed) separation of the concept of media on the one hand and the actual physical storage on the other hand (I will come back to this in a moment).
As we won’t be replacing the persistence provider we were able to reuse plenty of the code written for the IO provider. So instead of walking through all the code, let’s just have a look at the key differences (and imperfections).
When you create a custom Hive provider you start out with creating an Entity model and a schema, allowing you to convert an arbitrary data model into a model which can be interpreted by Hive. Hive provides base entity classes for both model and schema (currently TypedEntity and EntitySchema) and base repository classes (currently AbstractEntityRepository and AbstractSchemaRepository) which allow Hive to query against model, schema and relations.
The main problem we faced when creating a model and schema for blob storage is that we noticed the core property editor used for uploading files in the back office creates a File model (IO provider). Since we did not want to create a custom upload property editor just yet, we decided to have our Blob model derive from the File model instead.
Definitely something we will need to reconsider in the future, but for the sake of getting the project of the ground quickly it seems like a fair solution. This is also the reason why we didn’t put too much effort in tailoring the Blob schema just yet (a simple schema with a node name attribute seems to work just fine).
Next we copy/pasted the entity repository used by the IO provider and implemented most of the key logic in having data stored on blob storage. This actually all went pretty smooth and the one thing worth mentioning here is probably the use of Hive ID’s. The IO provider assembles Hive ID’s by normalizing the file path and although I’m sure it makes perfect sense we didn’t get it straight away and decided to implement our own strategy which basically comes down to: blob – container – blob name (which includes the GUID of the persisted media).
Although we haven’t yet fully implemented all of the entity/relation operations (and revision support for that matter) we did implement some of the relation operations which basically allow for the creation of thumbnails (AddRelation and PerformGetChildRelations). Having these implemented was amazingly straightforward by the way, so a big thumbs up to the core team! And again, you should really check out the source code to find out just how easy it was ;-)
That’s basically it, and now we seriously wanted to test-drive this baby! Unfortunately the upload property editor did require a small fix in order to make it work with our blob storage provider: the code used to create a thumbnail creates a bitmap by passing in a file path. To fix this we just had to replace 1 line of code (maybe 20 characters?) which I think still is impressive considering we are reusing the existing property editor ;-)
And it just works!
We were actually somewhat surprised to see it work immediately as we first figured we would still have to make some changes in having the editor display the thumbnails (they’re on blob storage, so what about the URL?) and this is when we found out Umbraco 5 uses a custom mechanism for rendering media, i.e. the following URL will display an image uploaded as media:
Remember the separation of the concept of media and the actual physical storage I mentioned earlier? Right!
Although we realize this is just a very basic implementation we would certainly love to see it being further developed into a solid Jupiter hive provider, and of course we love to hear your feedback and suggestions.
The source code is available on the Umbraco 5 contrib project (forked) or you can just download it here or download a full web application which has the provider already configured (it uses SQL CE so no need to setup a database).
At These Days we are already looking forward to the next release of Jupiter!