Anonymising a database

(Originally from Tim Gething's notes at http://www.openvpms.org/forum/how-anonymise-database )

The following may be used to anonymise an OpenVPMS 1.7.x database. The steps are as follows:

1. Back up the database

2. Restore the backup to a new server

3. Run the following script against the new server:

UPDATE contact_details
    SET value=concat((floor(rand()*9000)+1000),' ',(floor(rand()*9000)+1000))
    where name='faxNumber';
UPDATE contact_details
    SET value=concat((floor(rand()*9000)+1000),' ',(floor(rand()*9000)+1000))
    where name='telephoneNumber';
UPDATE contact_details
    SET value=concat((floor(rand()*9000)+1000),' ',CONV(10+FLOOR(RAND()*26),10,36),' Street')
    where name='address';
UPDATE contact_details
    SET value=concat(lower(CONV(10+FLOOR(RAND()*26),10,36)),(floor(rand()*9000)+1000),'@gmail.com')
    where name='emailAddress';

UPDATE entity_details
    SET value=concat(left(value,3),(floor(rand()*9000)+1000))
    where name='lastName';

UPDATE entity_details
    SET value=concat(left(value,1),(floor(rand()*9000)+1000))
    where name='firstName';

UPDATE entity_details
    SET value=concat(left(value,3),(floor(rand()*9000)+1000))
    where name='companyName';

UPDATE entity_details
    SET value=concat(upper(CONV(10+FLOOR(RAND()*26),10,36)),(floor(rand()*9000)+1000))
    where name='travel';

UPDATE entity_details
    SET value=concat(left(value,3),(floor(rand()*9000)+1000))
    where name='printedName';

UPDATE entities
    SET name=concat(left(name,3),(floor(rand()*9000)+1000))
    where arch_short_name like 'product.%';

UPDATE entity_identities
    SET identity=concat((floor(rand()*900)+100),'*',(floor(rand()*900)+100),'*',(floor(rand()*900)+100))
    ,name = identity
    where arch_short_name = 'entityIdentity.microchip';

UPDATE entities
    SET name=concat(left(name,3),(floor(rand()*9000)+1000))
    where arch_short_name='security.user' and length(name)>3 and name <>'admin';

UPDATE entities
    SET description=concat('Dr ',left(name,3),(floor(rand()*9000)+1000))
    where arch_short_name='security.user' and length(name)<=3;

UPDATE users
    SET password=concat('p',(floor(rand()*9000)+1000))
    where user_name <> 'admin';

UPDATE documents
    SET contents=0, checksum=0, doc_size=0
    where (mime_type <> 'application/vnd.oasis.opendocument.text' and mime_type<> 'text/xml');

UPDATE acts a
    join document_acts da on (da.document_act_id = a.act_id)
    join documents d on (d.document_id = da.document_id)
    SET d.contents=0, d.checksum=0, d.doc_size=0
where (a.arch_short_name = 'act.patientDocumentLetter'
  or a.arch_short_name = 'act.patientDocumentForm');

OPTIMIZE table documents;

#
# Remove ESCI passwords
#
update entity_relationships r, entity_relationship_details d
    set d.value = ""
where r.arch_short_name = "entityRelationship.supplierStockLocationESCI" 
    and r.entity_relationship_id = d.entity_relationship_id and d.name in ("accountId", "username", "password");

#
# Disable jobs
#
update entities 
set active = 0
where arch_short_name like "entity.job%" and active = 1;

 

Notes:

  1. If you use the above SQL, then all the customer data will be anonymised, however the derived fields will still contain the original address and phone number information. Hence you need to get this rebuilt. The easiest way is to use Administration|Archetypes to edit the party.customerPerson archetype.  Change the derived value for the description node from "concat(party:getBillingAddress(.),' - ',party:getHomeTelephone(.))" to omit the central '-', ie "concat(party:getBillingAddress(.),'  ',party:getHomeTelephone(.))" and then press Apply. When you get the 'Update Changed Nodes' window, press Cancel.  Then change the derived value back again to re-insert the central '-', and this time, press the OK button on to get the changed nodes updated.  This will rebuild the derived fields from the anonymised data.
     
  2. The above SQL leaves the admin user account unaltered.  You should change its login password to say 'admin' so that you do not have to reveal your real admin password.
     
  3. The above SQL does not anonymise the patient names.  If you feel that this is necessary then the following SQL will do the job:
UPDATE entities
    SET name=concat(left(name,2),(floor(rand()*9000)+1000))
    where arch_short_name='party.patientPet' and length(name)>2;

 

Syndicate content