Thursday, April 26, 2012

The most unappreciated feature in Sql Server 2008?

There is a feature in Sql Server 2008, that from my point of view has been hugely neglected. This is the Change Data Capture feature. Its main purpose is to keep historical data for all changes that happen in a given database.

How we did change data tracking so far?

Almost in all projects I have worked on so far, there have been some kind of custom implementation of this feature. Usually with history tables that are copy of the real one and database triggers, we are able to somewhat fulfil this need, but this approach has some big disadvantages. First, of all you should maintain this history tables and never forget to transfer the schema change that you make in the original table to the history one. Second, this inevitably adds a performance hit to you database operations, which in some cases could be significant. Last but not least, these history tables could end up being enormous part of your database. Because of this, on some projects we are not able to turn on the data tracking because the size of the database will hit the roof. 

How we can do it better using Change Data Capture?

The solution of all this problems is called Change Data Capture and is introduced in Sql Server 2008. You will need Enterprise, Developer or Evaluation edition to be able to use this feature and it has to been enabled for every database on your database that needs it. It is a great solution to all problems mentioned above: it automatically keep track for all new columns added, the changes in the history tables are applied asynchronously and you can set retention policy for how long your history records should be kept in the database. Probably, it doesn't make sense to keep you history data for more than a few days or a week most in your operational database. You can make (you should actually) periodically backups on your database that will keep all your historical data. What's more, you can turn on this feature only on the tables that you need and only on those columns that are interested to you. What's even more, all tables needed to support this feature are part of a custom scheme in the system tables. This way all history tables won't get on your way if your are using ORM tools or if your database has many tables. Finally, if you plan a bug data update and history tracking is not necessarily, you can easily turn it off and then switch on again, without loosing data. 

But is it really better?

That was what I asked myself, when first heard for this feature. I took the challenge and decided to make a little competition between our custom solution and the built in Change Data Capture feature. I took two copies of one of our databases and applied our history tables on the first one and turned on the CDC on the second one. The table in question has ~ 20 columns and ~ 200000 records. I also, prepared a big chunk of update queries in order to compare which will do a better job. See the results yourself:

Updated records Change Data Capture (in seconds) History triggers (in seconds)
2000 1 1
10000 2,5 17
20000 14 64
50000 18 403
100000 34 1475
150000 47 3334
200000 80 8500

That beats even my expectations. I ran every set of updates three times on each database - one with enabled CDC and the other with history triggers. These are the average times that every query took on my developer machine. You can only imagine what overhead is this for you production servers, especially if you frequently update or delete your data. Here is how this look graphically, I think you can imagine where this is going:

Of course, you probably don't update every day 200K records at once, but I'm sure that during the day you have at least 100K updated/deletes. This costs a lot of the server resources. This will also greatly reduce the chance of deadlocks occurring, especially when updating lots of records at once.

What is even better is that enabling this feature is a peace of cake. Check Introduction to Change Data Capture (CDC) in SQL Server 2008 from Pinal Dave, which was a great start from me. Just a heads up - be sure that your SQL Server Agent is enabled when turning on the CDC feature. And another good article in MSDN. Otherwise, your history tables will be empty and no changes will be tracked.

I plan to use this feature for the new projects that will come next (since the transition in existing project will not be very easy). I hope this helps you, feel free to use the comment form and share what solution you currently use.