Recently i've been required to insert 6 million database records into the latest version of one of our products.
Actually, it's a lot more than this because it it 6 million business objects which of course tend to have their related lookup tables so the figure is probably much more than this.
The problem I have is that for reliable data, the information needs to be inserted through the interface to guarantee it is like a production environment, and this is slow... especially when you get over 1 million database records in your table.
As an experienced dataabase developer, my first thoughts were to look at profiler, and get some information on improving the performance with code tweaks or better indexing. Being an almost mature product, there was little to do here, but a few changes to the indexes and stored procedures helped a bit. This gave be about 50% speed improvement but this didnt last for long.
Soon the process of inserting records, and the backgroung logic doing lookups to check for duplicates etc... was slowing down to 3 or four records per second, which was going to take forever. At 2 million records, it was clear that this needed a different approach.
To rub salt in the wound there were other factors stopping me doing my task.
- I had a brilliant laod testing app, i had written that could reduce call time/latency to the website by doing all this multithreaded. Unfortunately the code wasnt very threadsafe, and performance got worse with multiple threads due to all the locking
- My super fast servers from last year had been hijacked by IT to run sharepoint
- I could just run several databases at low volumes and piec ehem all together. THis involved a big manual manipulation of the data (a risk) and also ..i didnt have time. At table sizes of this size , even SQL server's interface was struggling.
The solution was based around previous tests where we found a super fast SAN storage system sped up performance significantly. However i didnt have access to one in such a short time.
I decided to improvise adn create a RAM disc to store the SQL database on and hopefully speed things up. This in theory should be much faster at seeking reading/writing than a standard mechanical hard drive.
I used a PC with 12GB ram, and created A RAM DRIVE.
I downloaded a trial version of this software: http://www.superspeed.com/servers/ramdisk.php and created an 8GB drive within 5 mins.
I then 'detached' my database using the SQL server interface, and moved the mdf,ldb files to the ram drive.
I then re-attached the database amking sure sql server had permission to the new files and their storage location.
After a database shrink and index rebuild on all tables i re-started my task.
Insertion continued at 50-100 inserts per second which was much faster than before. And slowdown has stopped or at least is slowing down at a slower rate than before.
You can also move the system temp db file to the ramdisk too which can speed things up further, however, bear in mind if you need to do any large data deletion or modification on large numbers of rows, you may find the tempdb grows and fills your ram disk whilst trying to do this. For this reason , i left this out on my final run.
Dont forget a large dataabse will probably get several maintenance tasks carried out on it over a growth like this so it may be handy to set an hourly index rebuild task whilst your insertion run is executing.
Annoyingly this should all be ready just before our much more efficient mass insertion feature passes testing.
If you have the resources, i'd try a solid state drive or SAM system as you must remember your dataa will be lost as soon as the power goes off.
Dont forget to back it up once you've finished