Performance of Data Loader
I am working on the migration of a reasonably sized RxWorks system (14500 customer records).
I am very happy with the performance of Kettle (used to extract and clean up the customer data) - it can extract the RxWorks data and generate the XML data for the dataload input in under 5 seconds.
However, the dataload performance is disappointing. It happily processes my setup.xml (containing all the base stuff, postcodes, species & breeds, etc) loading 2309 objects in 3.00 seconds (769.67 objects/sec), but doing the customer data it runs like a dog:
INFO StaxArchetypeDataLoader,main:255 - party.customerperson 1277
INFO StaxArchetypeDataLoader,main:255 - contact.location 1223
INFO StaxArchetypeDataLoader,main:258 - Processed 2500 objects in 494.00 seconds (5.06 objects/sec)
[Note that this is a subset of the data - processing the whole 14.5K set, the speed drops to under 4/sec. I tried a subset hoping that the speed would improve back to the 700/sec got processing the setup data.]
I am using a reasonable machine: Dell XPS laptop, 8GB, Intel Core i7; Win7, OpenVPMS s/w & database on the one machine; the cpu usage is low, and the disk is not running hard.
Is there anything I can do to improve the speed of pumping data into the database?
Regards, Tim
Re: Performance of Data Loader
You can increase the batch size to reduce database accesses. This may improve performance. The dataload script currently sets it to 1000 objects:
For large datasets however, you're probably better off using the OpenVPMS Kettle plugin to perform the loads.
Note that you'll need to use Kettle 3.2, and patch it according to OVPMS-1203.
For some background, check out this thread: http://www.openvpms.org/forum/migrating-vetcare which also contains some sample transforms.
Re: Performance of Data Loader
Tim (?) - thanks for this. You are correct about the performance boost - I am getting around 200/sec. Now all I have to do is fully understand the OpenVPMS Kettle plugin - see new post.