Sunday, February 27, 2011

Data migration for dummies

For the last month I've been working on data migration between three different systems. This turned out to be quite hard task and now I learned some valuable lessons for me.

What seems to be our biggest mistake was to underestimate this task. Trying to migrate data from to distinct systems that are outside your control to a system that is just taking its first breath can be very frustrating for the all people and stakeholders involved. Plan it carefully with frequent milestones to meet. At every milestone try to communicate the progress with the future consumers of your system in order to catch possible incompatibilities as early as possible. Expect the unexpected and always keep a time buffer.

To start you need to research all the possible ways to get the data from the source database. In our case we couldn't get a dump of the source databases from our vendors, so we had to deal on our own. We had to deal with custom report functionalities in the web sites and a web services that was exposed for writing 3rd party tools. Its important to get as much data as you can, sometimes we could get some fields only through the reports and others only from the developer's API. It is always nice if you get to an unique identifier for every object in the source database. This way future fixes and data deduplication is much easier.

Before start writing your migration logic make sure that you have mapped all nomenclatures from the source system into the target system. Often, this means that a business decision should be taken. If you plan to do the migration using SQL scripts ( as we did ) - make sure to improve your skills in SQL. Some basic level might be enough when dealing with ORM and writing simple custom queries from time to time, but preparing a data migration scripts is not an easy job and you will make use of some additional knowledge for sure. 

Do not neglect the quality of the code you write. Sure it will be used only one time and probably deleted for ever after that, but writing and debugging spaghetti code for more than a month will be much easier if you follow your code policy as you do in your every day development. Do not hard code constant values if there is a business logic in them (statuses, types, etc...), use variable to review what is the meaning of the constants. Performance matters, too. It makes difference if your migration will take a minutes or a day. After all, you are dealing with live data and sometimes the business cannot be stopped for a day in order to perform a migration. Also, make sure that you have source controlled your scripts - you can freely deleted it after the migration is over.

In our case pair programming was a huge success during this month. The main reason is that the work is hard to be separated in smaller task that can be solved concurrently. Also, usually the logic of migration is harder to analyzed and debugged, so having additional pair of eyes to help you is much more productive than working separately. 

You will probably need SQL-abled tester in order to verify your work. Testing data migration without querying the database is like looking for a needle in a haystack. The tester could be another developer that was not involved in your migration process. Also, plan your scripts in order to be revertible if failure occurs in every step of the migration, as much as possible. All kind of errors are possible when merging to separate systems - data truncation errors, data cast errors etc. Use transactions and plan your script carefully in order to handle this kind of errors. What can help you here is to be proactive and test your migration script with live data both from source and the target system.

In our case when we had a fixed deadline to meet, making fixes in the last day or two turns out to be a disaster. Avoid introducing big changes in your logic in the last days. On contrary, try to plan your migration in a way that allows applying fixes later, after the migration is done.

Well, these are the tips that could saved me a lot of time and hopefully you can make use of them. And finally, prior starting your work evaluate the effort you are going to spend and consider migrating the data by hand. Some times you can spend more resources and time in writing scripts and code, than manually migrating your data. 

Furthermore, writing SQL script can be very frustrating for a .NET developer and this will of course affect the performance and quality of his work. I happened to be in this position and sometimes is critical to be able to motivate yourself and keep your spirit up.