← Back to team overview

dhis2-users team mailing list archive

Serious problem with duplicate records in Individual records/tracker

 

Hi all,

We have an big issue with several of Rwanda's event reporting systems that
have been set up using Tracker/Individual records.  For some reason, users
are able to create duplicate records in the system through normal data
entry.  It seems to happen when they enter a value for an attribute and
then a couple of seconds or milliseconds later they change that value.  The
system then stores both records with last updated several milliseconds
apart.

This is a big issue because analytics fails to run when it finds these
duplicates.

Here is the query that finds the duplicates:

select programstageinstanceid, dataelementname, count(*) as numrecs from
(
SELECT
  program.name AS programname,
  program.description,
  programinstance.dateofincident,
  programinstance.trackedentityinstanceid,
  programinstance.status,
  programstageinstance.programstageinstanceid,
  programstageinstance.programinstanceid,
  programstageinstance.programstageid,
  trackedentitydatavalue.value,
  dataelement.name AS dataelementname
FROM
  public.programstageinstance,
  public.programinstance,
  public.program,
  public.trackedentitydatavalue,
  public.dataelement
WHERE
  programstageinstance.programinstanceid =
programinstance.programinstanceid AND
  program.programid = programinstance.programid AND
  trackedentitydatavalue.programstageinstanceid =
programstageinstance.programstageinstanceid AND
  dataelement.dataelementid = trackedentitydatavalue.dataelementid
  ) as foo
  group by programstageinstanceid, dataelementname
  having count(*) >1
  order by numrecs desc;

Here is the typical output of the query when there are duplicate records:

Programstageinstanceid, dataelementname,numrecs
777921;"TB Sample type_morning Sputum";2
776886;"TB Date of collection_sample";2
773132;"TB History of contact with TPB+ case";2

Here is an example of the duplicates we see when we select the duplicated
records from the trackedentitydatavalue table.

prograstageinstanceid,
dataelementid,value,lastupdated,providedelsewhere,user
773132;35105;"true";"2015-10-02 09:32:48.334";FALSE;"abibarimana"
773132;35105;"true";"2015-10-02 09:32:48.508";FALSE;"abibarimana"
776886;31879;"2015-07-17";"2015-10-07 16:25:26.341";FALSE;"mugeni"
776886;31879;"2015-07-14";"2015-10-07 16:25:26.546";FALSE;"mugeni"
777921;35106;"true";"2015-10-08 09:55:36.133";FALSE;"emufasha"
777921;35106;"true";"2015-10-08 09:55:36.278";FALSE;"emufasha"

In order to clean up the trackedentitydatavalue table, I need to write
these records to a temporary table, delete them
from trackedentitydatavalue, then decide which of the duplicate pairs to
delete before copying the correct ones back into trackedentitydatavalues.

This error seems to happen almost daily when the systems are in full use.

Is there a way we can put a constraint on the trackedentitydatavalue table
so that it will not accept duplicate values for programstageinstanceid and
dataelementid?  I also think there must be an error in the java code that
should be updating the first saved record with the one entered by the user
later rather than inserting a new record.

This is a major bug because it causes analytics to fail and when it fails
we cannot run any event reports to get data out of the system.

-- 
*Randy Wilson*
*Team Leader: **Knowledge Management, Data Use and Research*
Rwanda Health System Strengthening Activity
Management Sciences for Health
Rwanda-Kigali
Direct: +250 788308835
E-mail: rwilson@xxxxxxx
Skype: wilsonrandy_us
<http://www.msh.org/>
Stronger health systems. Greater health impact.
<https://www.facebook.com/ManagementSciencesForHealth>
<https://twitter.com/MSHHealthImpact>
<https://www.youtube.com/user/MSHHealthImpact>
www.msh.org

-- 
*This message and its attachments are confidential and solely for the 
intended recipients. If received in error, please delete them and notify 
the sender via reply e-mail immediately.*

Follow ups