Monday, May 14, 2012

Choosing the cloud for your database

Some time ago, I was asked to give a lecture about Relational databases in cloud environment as part of Telerik Academy. While strengthening my knowledge and skills in this area, I made a research on the current solutions offered and asked myself "Which one is better?"... And then this post was born :).

Have in mind that this is a guide that researches only the database cloud options, ignoring where your corresponding application is hosted. I will try to extract some advices that will not be vendor (or technology) specific, since this is rapidly changing environment. Even while I was writing this post and preparing for my session, Amazon announced their support for SQL Server database. This is a big news according to my opinion, as Amazon are starting direct competition with Microsoft and their SQL Azure offering, trying to beat them in their own field. If they are going to succeed - depends on what is your project.

In order to choose which cloud solution is the one for your project, you first need to know what are your project requirements, priorities and roadmap. There are several areas where Amazon and Microsoft widely defer giving you different advantages and disadvantages to choose from. 
  • Budget
  • High availability
  • High scalability
  • Ease of use
  • Database features
  • Performance
Before choosing your cloud vendor, your first need to figure out how every single one of this aspects corresponds to your project and prioritize them. Give them priorities from 1 to 5 and avoid giving equal priorities. Every choice that you make from this list is a trade of that you make and often you want be able to satisfy two or more aspects equally. 


Budget is usually the strongest limitation for every project. A deep research in the pricing model of every cloud vendor is required in order to make a good decision here. In my opinion, SQL Azure offering is easier to understand and predict, even though it might be not so flexible with respect to the actual resources used. Have in mind, that the final cost for your cloud account might be function of different parameters like - disk storage used, inbound and outbound data traffic, CPU utilization, etc. Amazon's RDS for example, have a more sophisticated tariff plan and that's why might be difficult to estimate what will be your monthly expenses. A good understanding of the provided service is also a must when choosing your cloud solution. In SQL Azure, the "high availability" feature comes out of the box at no cost, while in Amazon RDS this might double or triple your monthly expenses.

High availability

Speaking of high availability, be sure to read all the agreements that are part of your contract. Amazon RDS will reserve their right for up to 4 hours a week downtime for maintenance and upgrade procedures. This might be a deal breaker for your project or might force you to subscribe for higher cost plans (in particular case the Mutli AZ plans will provide you with failover instance when maintenance occurs to your active one). 


Having a good perspective about the future of your project and more specific your database needs, you might require a cloud solution that scales well. Most of the projects I've worked so far, have constantly increasing database size and this is usually a key business requirement that cannot be overlooked. For most businesses, gathering more data means offering a better product or service. Lately, serving personalized content according to customers needs and flavours means a better user experience and all this means - gathering data, loads of data. 

Microsoft Azure and Amazon RDS have different scaling options. While Amazon RDS have really good scale up possibilities, they are still limited by the hardware used and the cost for maintenance. Scaling up on the other hand comes at zero development and administrative effort and is a good solution for smaller projects and not very big requirements. SQL Azure real strength is in scaling out. Using federations, you can really go beyond the limits of the hardware nowadays and lower your cost for high class hardware and infrastructure maintenance. This on the other hand will cost you some effort in development and might be not applicable in some cases.Without knowing your business plan and future roadmap of your project, you won't be able to get the best option here.

Ease of use

Sometimes, the fast development and initial deployment of your project is crucial for succeeding. You all know about those time sensitive tasks that aim to get the maximum of a specific hype and sometimes just a few days or a week delay everything pointless. That's why the better tooling environment might be the key feature that will give you the answer. If you are used to mysql tooling environment, starting a new project on SQL Azure might not be the best call for you. On the other hand Microsoft have developed a whole cloud based development platform for interacting with your database instances. This way, all you need is your azure subscription and you get all the services needed - that database itself and all the database client tools that you need to finish your project.

Database features

According to Amazon RDS, the database instances that they provide are pretty much the same as you know them outside the cloud. On the other hand, SQL Azure is written from scratch for the cloud and currently lacks some of the features you know from SQL Server - CLR, Service Broker, Analysis services, Replication etc. This might make the transition of your existing application to the cloud a little bit harder, but will bring some really cloudy features to your toolbox.


Last, but certainly not least - the performance. It is scientifically proved that the performance of  your web application is tight correlated with the bounce rate and exit rate in your analytic report. Again, knowing your customers and the business requirements will give you the answer how important is the performance for your application. Comparing the performance between different cloud vendors is difficult again. It depends on your location when running the test and your connectivity with the vendors' data centres. It also depends on what hardware you have subscribed for and your specific database implementation. You might get better result for one specific database schema on SQL Azure and totally different results when using other database for the tests. I guess that the best approach here is to make performance tests against your real project. Have in mind that usually better performance means higher costs and this directly contradicts with the first considerations that we have. Usually, this ends up to be a simple business decision - money vs. performance.


What all this means for you is that you as technical person probably won't be able to make this technical decision on your own. You will need to collaborate with all the stakeholders and business owners of your project in order to pick the right solution for you. No matter what solution you choose, have in mind that this game constantly changes the rules - new vendors enter the market, existing vendors change the offerings and this will happen even faster in the future as the cloud solutions get more and more popular and attractive. Be prepared and design your application to be flexible with respect to the cloud solution, as you might need to change it in the future.