A blob storage hive provider for Umbraco 5 (beta)

By

An introduction into Umbraco Hive and creating a Hive provider which stores media on Azure blob storage.

Related files: download the source code - download the web application

Jupiter is just around the corner

With its first beta release on November 11th the road to Umbraco 5 (aka Jupiter) has brought us very close to its destination. The members of the Umbraco core team have been working really hard on stabilizing the API and many of our beloved (and improved) CMS features are well taking shape.

As Umbraco 5 remains open source all current and future releases are made available through Codeplex which allows developers of the Umbraco community to explore and work with the source code, to contribute and to provide the core team with valuable feedback.

After having attended the awesome Umbraco v5 hackathon organized by BUUG and hosted at our offices in Amsterdam – and also very much inspired by Alex Norcliffe’s talk at the Microsoft offices in Brussels – we really wanted to get our hands dirty and – deep dive into hive!

Umbraco Hive

Many of the recent Umbraco 5 talks and presentations involve Hive, a completely new and extremely powerful concept in the upcoming version. Much unlike a traditional data layer in most n-tier or n-layer (web) applications Hive represents an abstraction layer for developers to plug in multiple, stackable data providers. These Hive providers can pull data (read and/or write) from nearly any data source and allow for smooth integration within the Umbraco back office as well as a transparent way of querying the data in the front-end.

The first beta release comes with three Hive providers, created by the core team:

  • A persistence provider that already supports SQL server and SQL CE (which uses NHibernate as ORM)
  • An Examine provider for data indexation (which uses the Lucene.NET indexer)
  • An IO provider which stores data on the local file system

Basically this means Hive already supports the vast majority of (.NET) based websites.

By the way, didn’t I mention these providers are stackable? In a recent demo (video currently not available yet) Alex quickly swapped the Examine provider with the persistence provider, having Umbraco run entirely on a file index without using a database at all. Although that’s a neat trick by itself, the true power of Hive is the ability to query multiple, stacked providers: imagine having data stored in both a database and a file index while data is always retrieved from the file index in order to gain a performance boost… Hive will make it work.

Community hackathons

In many countries the ongoing v5 hackathons allow Umbraco developers to come together, share experiences, learn and contribute. These sessions are the perfect opportunity to experiment with the various new concepts in v5 such as hive, hive providers, property editors, surface controllers, tree controllers and much more.

During the BUUG hackathon we were able to play around with the WordPress Hive provider created by the Karminator. Basically the provider connects to a WordPress blog and fetches categories and posts which are then loaded into the Umbraco back office. And even though our efforts of making the provider writeable were somewhat fruitless (#h5is) it did allow us to get familiar with Hive and understand how to create and configure a custom Hive provider.

The source code of the WordPress hive provider can be found on bitbucket and has also been added to the Umbraco 5 Contrib project on Codeplex which was introduced during the recent UK hackathon.

Since we weren’t able to attend the Umbraco UK festival and we didn’t want to wait until the next hackathon to further explore v5, we started a somewhat experimental side project: whether or not the websites we create are hosted on Windows Azure we often want to store resources on Windows Azure CDN or blob storage. In Umbraco v5 style sheets, scripts, templates and media uploads are stored on the local file system. All of this is handled by the IO Hive provider, so we decided to create a Hive provider which is to save these files on blob storage, and it works!

As for the more technical part of this post, let’s have a closer look at the setup.

Configuring the Blob Storage Hive provider

When you have a closer look at the Hive configuration file you will notice that a Hive provider can be configured separately for each of the storage types known by Umbraco.

The hive configuration file, located in App_Data\Umbraco\Config

The hive configuration file, located in App_Data\Umbraco\Config

The configuration file allows you to configure the type of the Factory class that is used by Hive to instantiate the provider’s entity repository (more on this later) as well as a reference to the provider configuration section which is located in the web.config. As shown below this configuration section allows users to further configure the Hive provider.

The umbraco configuration section in web.config

The umbraco configuration section in web.config

As for the blob storage provider we allow users to configure the name of the connection string of the blob storage account (which is to be added to the connection strings section) as well as the name of the blob container.

Creating a custom configuration section is fully supported by the Umbraco v5 Framework and is just a matter of creating/inheriting some classes that will allow the settings to be injected into the Entity repository (handled by IOC, very sweet). Make sure to have a look at the source code to find out exactly how easy it is ;-)

The provider dependency helper is a property of the abstract entity repository

The provider dependency helper is a property of the abstract entity repository

Creating the Blob Storage Hive provider

Although we haven’t gotten around testing the provider for all storage types just yet, obviously the first thing we had in mind was having the provider handle media file uploads. The one thing which distinguishes media from other storage types is is that it uses both a persistence provider and a storage provider. This approach provides a (much needed) separation of the concept of media on the one hand and the actual physical storage on the other hand (I will come back to this in a moment).

As we won’t be replacing the persistence provider we were able to reuse plenty of the code written for the IO provider. So instead of walking through all the code, let’s just have a look at the key differences (and imperfections).

When you create a custom Hive provider you start out with creating an Entity model and a schema, allowing you to convert an arbitrary data model into a model which can be interpreted by Hive. Hive provides base entity classes for both model and schema (currently TypedEntity and EntitySchema) and base repository classes (currently AbstractEntityRepository and AbstractSchemaRepository) which allow Hive to query against model, schema and relations.

The main problem we faced when creating a model and schema for blob storage is that we noticed the core property editor used for uploading files in the back office creates a File model (IO provider). Since we did not want to create a custom upload property editor just yet, we decided to have our Blob model derive from the File model instead.

The Blob typed entity which derives from Umbraco.Persistence.Model.IO.File

The Blob typed entity which derives from Umbraco.Persistence.Model.IO.File

Definitely something we will need to reconsider in the future, but for the sake of getting the project of the ground quickly it seems like a fair solution. This is also the reason why we didn’t put too much effort in tailoring the Blob schema just yet (a simple schema with a node name attribute seems to work just fine).

Next we copy/pasted the entity repository used by the IO provider and implemented most of the key logic in having data stored on blob storage. This actually all went pretty smooth and the one thing worth mentioning here is probably the use of Hive ID’s. The IO provider assembles Hive ID’s by normalizing the file path and although I’m sure it makes perfect sense we didn’t get it straight away and decided to implement our own strategy which basically comes down to: blob – container – blob name (which includes the GUID of the persisted media).

Although we haven’t yet fully implemented all of the entity/relation operations (and revision support for that matter) we did implement some of the relation operations which basically allow for the creation of thumbnails (AddRelation and PerformGetChildRelations). Having these implemented was amazingly straightforward by the way, so a big thumbs up to the core team! And again, you should really check out the source code to find out just how easy it was ;-)

The AddRelation method of the Entity Repository

The AddRelation method of the Entity Repository

That’s basically it, and now we seriously wanted to test-drive this baby! Unfortunately the upload property editor did require a small fix in order to make it work with our blob storage provider: the code used to create a thumbnail creates a bitmap by passing in a file path. To fix this we just had to replace 1 line of code (maybe 20 characters?) which I think still is impressive considering we are reusing the existing property editor ;-)

A small fix in the core Upload property editor

A small fix in the core Upload property editor

And it just works!

Media section in a local Umbraco v5 beta installation

Media section in a local Umbraco v5 beta installation

Cloudberry explorer showing the media files on blob storage

Cloudberry explorer showing the media files on blob storage

We were actually somewhat surprised to see it work immediately as we first figured we would still have to make some changes in having the editor display the thumbnails (they're on blob storage, so what about the URL?) and this is when we found out Umbraco 5 uses a custom mechanism for rendering media, i.e. the following URL will display an image uploaded as media:

http://umb5beta.local/Umbraco/Media/Proxy/media%24empty_root%24%24_p__nhibernate%24_

v__guid%24_28a171dda97048f98e6b9fa501792a5b?propertyAlias=uploadedFile

Remember the separation of the concept of media and the actual physical storage I mentioned earlier? Right!

What’s next?

Although we realize this is just a very basic implementation we would certainly love to see it being further developed into a solid Jupiter hive provider, and of course we love to hear your feedback and suggestions.

The source code is available on the Umbraco 5 contrib project (forked) or  you can just download it here or download a full web application which has the provider already configured (it uses SQL CE so no need to setup a database).

At These Days we are already looking forward to the next release of Jupiter!

This entry was tagged with ASP.NET, Open source, Umbraco, Web development, Windows Azure. Bookmark the permalink.

Comments

  1. Hi guys,

    Great blog post and very cool to see the Azure hive provider, have you done any more work on this since?

    Cheers,

    Chris

  2. Hi, I am completly new to this Umbraco (5) thing. It works pretty easy and now I want to add some stuff to the Sections part.
    I read your stuff about the Hive provider. I'm not sure if that's what I need.
    I want to add some pages (in a treeview would be nice) to insert/update data in a different database.
    Could you guy's give me some pointers of where to start?

    • By Lennart Stoop

      Hi Marco,

      In Umbraco (5) you can easily create a custom tree in the content section or in a custom section. Hive providers make it especially easy to integrate a legacy or custom database into the backoffice, and I would indeed recommend you create one. For example @netaddicts has recently created a Hive provider for the Adventure Works database:
      https://bitbucket.org/netaddicts/adventureworks

      I understand documentation on these topics is somewhat scarce at the moment, but if you get stuck during development, I would really recommend posting on our umbraco for quick answers and suggestions.

      Grtz
      L

    • By Lennart Stoop

      By the way, I just noticed the wiki has been updated with a collection of useful Umbraco 5 resources

  3. Hi,

    Thank you very much for this post.

    I am quite new to Umbraco and to version 5. So, please bear with me if my questions are elementary.

    I am working on hooking up the Hive Provider with our CDN. So basically the objective is when we upload media using the media picker in Umbraco, the file is uploaded directly to the CDN.
    I have downloaded the source code (for the Hive Provider project as well as the full web app). However I am coming across some obstacles with regards to implementing the code for our CDN. I wonder if you can help me please.

    1. The Initialise() method in Umbraco.Hive.Providers.AzureBlobStorage.ProviderDemandBuilder class uses the deep config manager to retreive various repository settings for the blob. The code here refers to the web.config in the App_Plugins folder. However, I am unable to locate these settings in the website config files. Can you point me to the right direction please.

    2. What is the usage of the Relations folder? The AddRelation() passes a string in the format of Settings.RelationsFolder + "/" + sourceMd5 + "-" + destMd5 + ".xml" as the blob name. However in the picture on your post, you have shown the end result

    2. After looking at the HiveProvider source code, I have come to the conclusion that the folowing are the most important methods that I need to worry about (since they are being invoked in the overriden methods of the Repository). Am I missing anything?

    bool BlobExists(string containerName, string blobName)
    bool GetBlob(string containerName, string blobName, out byte[] content)
    IList ListBlobs(string containerName, string blobPrefix)
    bool CreateBlob(string containerName, string blobName, byte[] content)
    bool GetBlobProperties(string containerName, string blobName, out SortedList properties)
    HiveId GenerateId()
    HiveId NormaliseId()
    string GetBlobName()
    string GetBlobContainer()
    Blob Hydrate()
    IEnumerable GetChildRelations()

    Thank you very much for you help.

    Regards,
    Shwetha

    • By Lennart Stoop

      Hi Shwetha,

      Sounds like a cool project! I think creating the Hive provider to hook up with your CDN should be pretty straightforward. To answer your questions:

      1) If you open the web application, you will find the main Hive config file in \App_Data\Umbraco\Config\umbraco.hive.config. The provider deep config can be found in the web.config in the root of the web application.

      2) The relations XML files are used to keep track of the relations between media files (e.g. a source media file and its thumbnail) and they should also be stored on the CDN (In a "Relations" folder in the root of the media folder). If you run Umbraco 5 locally with the default IO provider configured, you should notice this folder and XML files being created in \App_Data\Umbraco\Media on the local file system.

      3) Most of the magic really happens in the Repository class by overriding the methods you mentioned (I recommend you also look at the implementation of the default IO provider) and make calls to the API provided by your CDN for uploading and downloading the media files. Please keep in mind that the code I've provided in this post was implemented against a beta version and the Umbraco 5 RTM source code may have been changed. As for a complete Azure Hive provider implemented against Umbraco 5 RTM I recommend you also look at the Azure Hive solution by Morten.

      If you have any more questions, please don't hesitate to ask them here or on our.umbraco.org

      Grtz
      L

  4. Hi Lennart,

    Thanks for your reply.

    Following your suggestion to download and run Umbraco locally, I tried to download the Umbraco 5 source code and reference this with my custom Hive provider. I have downloaded the source code from http://umbraco.codeplex.com/SourceControl/list/changesets by selecting 'All' from the Branch drop down. However, I cannot locate the solution to run in VS. The downloaded zip contains

    - Notes
    - Resources
    - Sandboxes
    - Source

    I have looked at the 'Source' folder and cannot find the 'Umbraco.Sln'.

    What am I doing wrong?

    Thanks again,
    Shwetha

  5. Hi,

    Sorry to bombard you with questions like this. I sorted out my previous issue with finding the source code. I have hooked my my custom hive provider class library with the umbraco 5 source solution. I have made all the required config changes in the root web.config file as well as in the ~App_DataUmbracoConfigumbraco.hive.config.

    However, when I run the application, I receive an 'Object reference not set to an instance of an object.' error. This error occurring in Umbraco.Hive.Providers.GoGridStorageUmbraco.Hive.Providers.GoGridStorageEntityRepositoryFactory.cs. The method of interest in GetReadonlyRepository(). When I debug the solution, I can see that IoDependencyHelper is null. I dont know if I am missing any config.

    Could you please shed some light on this.

    Thanks,
    Shwetha

    • By Lennart Stoop

      Hi Shwetha,

      I'm guessing you're either missing a config setting or the dependency was not get registered correctly in the Umbraco IoC container.
      Have you created a ProviderDemandBuilder (a class inhereting Umbraco.Hive.AbstractProviderDependencyBuilder)? I remember this class being used to read the deep config settings and to create the dependency helper. If you have, can you add a breakpoint to the Initialise() method and verify its being called when you start up the application?

  6. Hi Lennart,

    Thank you for your continued support with this.

    Yes, you were right that I was missing config. What was happening was that I updated the web.config file directly instead of yhe web.template.config file. Once I updated the web.template.config file, the changes were carried forward to the web.config file :)

    Now, to my most latest issue. I am able to upload content to our CDN (yipeee) but after the upload, in UploadEditorModel.cs (Source/Libraries/Umbraco.CMS.Web.PropertyEditorsUploadUploadEditorModel.cs) when the image is being rendered to the editor, Image.FromFile(file.RootedPath), an error is thrown that the file is not found.

    With in the Repository class of the custom hive provider, I have set the RootPath to the CDN site something like this:

    if (BlobHelper.CreateBlob(blob.ContentBytes, containingFolder, blob.Name))
    {
    blob.RootedPath = ">" + containingFolder + "/" + blob.Name;
    }

    In the code example project that you have provided, the File object had a Location attribute that is no longer present. The File object now has Name, RootedPath, RootedRelativePath.

    I would be grateful for any help.

    Thanks,
    Shwetha

    • By Lennart Stoop

      Glad to hear you are making some progress! I also remember struggling with the file paths a bit, and having the path concatenated within the Hive Id. Since the File model has changed, I recommend you have a look at how Morten handles file paths (RootedPath & RootedRelativePath) in his EntityRepository for blob storage (especially the implementation of GetFiles() and its helper methods).

      The problem with the core Upload editor is that it tries to create the image from a file path, which it shouldn't. Check out this change set by Morten in which the upload editor has been fixed in order to read the image from a stream (for which a handler is set in the repository's Hydrate() method).

      Hope this helps!

  7. Hi Lennart,

    Hope you are well. Sorry for the long hiatus. I have only just picked this up again.

    I have managed to get all the core functionality working ie uploading, removing, choosing another file for a media and so on.

    However, my current obstacle is with removing the relational XML when a file is removed. In Morten's changeset, I can see a PerformRemoveRelation() but where I am stuck is -

    How do I get the IRelationById (that is required by PerformRemoveRelation()) - The PerformFindRelation() returns this but I have no idea of how to implement this method.

    I am thinking the ideal fix would be to call this.FindRelation() and then this.RemoveRelation() from within the PerformDelete() method

    Thanks for your help.

    Regards,
    Shwetha

  8. Great blog, looking forward for its release!

  9. Great post! I'm trying to get this working for a while with no luck but I've messed up some config. The ProvideDemandBuilder runs Initialize() however localConfig is always null. I tried bypassing by hardcoding ie... ( _settings = new RepositorySettings("AzureBlobStorage", "orbitalhive", "*.jpg;*.gif;*.pdf;*.docx", "", true); ) but when i try to upload media I get Declaration referenced in a method implementation cannot be a final method. Type: 'Umbraco.Hive.Providers.AzureBlobStorage.Entity.Repository'. Assembly: 'Umbraco.Hive.Providers.AzureBlobStorage, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.

    Interestingly, the Initialize method fails with errors for these 2 attributes on file-uploader element. rootContainer and connectionStringName. I'm using 5.1 btw

    • By Lennart Stoop

      Hi Mike,

      If the deep config settings aren't loaded properly you might be missing a setting in web.config (Swetha also ran into this issue, check the conversation above).
      The error message you ran into when uploading media seems to be version related though, and most probably the code I provided here no longer works against v5.1.
      Have you tried using the hive provider maintained by Morten?
      I am planning to experiment a bit with that one myself soon, and I will update this post (or create a new one) with my findings.

      Hope that helps

Leave a Reply