← Back to team overview

dhis2-users team mailing list archive

Re: Serious problem with duplicate records in Individual records/tracker

 

If you experience reasonably high latency (slow response times) it seems
quite likely you might well get duplicates. Probably related to

https://bugs.launchpad.net/dhis2/+bug/1498505

On 9 October 2015 at 15:40, Wilson, Randy <rwilson@xxxxxxx> wrote:

> I will check but when i remove the duplicates analytics runs. Also usually
> we get many more duplicates 100 or more each time I check.
>
> Randy
> On Oct 9, 2015 3:48 PM, "Ant Snyman" <ant@xxxxxxxx> wrote:
>
>> Hi Randy,
>> There is actually a constraint on the trackedentitydatavalue table and
>> you cannot create two entries with the same programstageinstance and
>> dataelementid. I notice that you are grouping on dataelement name, is it
>> possible that you could have two dataelements with the same name in your
>> dataelement table? I.e different dataelementids but the same name?
>> Regards
>> Ant
>>
>> On 9 October 2015 at 10:22, Wilson, Randy <rwilson@xxxxxxx> wrote:
>>
>>> Hi all,
>>>
>>> We have an big issue with several of Rwanda's event reporting systems
>>> that have been set up using Tracker/Individual records.  For some reason,
>>> users are able to create duplicate records in the system through normal
>>> data entry.  It seems to happen when they enter a value for an attribute
>>> and then a couple of seconds or milliseconds later they change that value.
>>> The system then stores both records with last updated several milliseconds
>>> apart.
>>>
>>> This is a big issue because analytics fails to run when it finds these
>>> duplicates.
>>>
>>> Here is the query that finds the duplicates:
>>>
>>> select programstageinstanceid, dataelementname, count(*) as numrecs from
>>> (
>>> SELECT
>>>   program.name AS programname,
>>>   program.description,
>>>   programinstance.dateofincident,
>>>   programinstance.trackedentityinstanceid,
>>>   programinstance.status,
>>>   programstageinstance.programstageinstanceid,
>>>   programstageinstance.programinstanceid,
>>>   programstageinstance.programstageid,
>>>   trackedentitydatavalue.value,
>>>   dataelement.name AS dataelementname
>>> FROM
>>>   public.programstageinstance,
>>>   public.programinstance,
>>>   public.program,
>>>   public.trackedentitydatavalue,
>>>   public.dataelement
>>> WHERE
>>>   programstageinstance.programinstanceid =
>>> programinstance.programinstanceid AND
>>>   program.programid = programinstance.programid AND
>>>   trackedentitydatavalue.programstageinstanceid =
>>> programstageinstance.programstageinstanceid AND
>>>   dataelement.dataelementid = trackedentitydatavalue.dataelementid
>>>   ) as foo
>>>   group by programstageinstanceid, dataelementname
>>>   having count(*) >1
>>>   order by numrecs desc;
>>>
>>> Here is the typical output of the query when there are duplicate records:
>>>
>>> Programstageinstanceid, dataelementname,numrecs
>>> 777921;"TB Sample type_morning Sputum";2
>>> 776886;"TB Date of collection_sample";2
>>> 773132;"TB History of contact with TPB+ case";2
>>>
>>> Here is an example of the duplicates we see when we select the
>>> duplicated records from the trackedentitydatavalue table.
>>>
>>> prograstageinstanceid,
>>> dataelementid,value,lastupdated,providedelsewhere,user
>>> 773132;35105;"true";"2015-10-02 09:32:48.334";FALSE;"abibarimana"
>>> 773132;35105;"true";"2015-10-02 09:32:48.508";FALSE;"abibarimana"
>>> 776886;31879;"2015-07-17";"2015-10-07 16:25:26.341";FALSE;"mugeni"
>>> 776886;31879;"2015-07-14";"2015-10-07 16:25:26.546";FALSE;"mugeni"
>>> 777921;35106;"true";"2015-10-08 09:55:36.133";FALSE;"emufasha"
>>> 777921;35106;"true";"2015-10-08 09:55:36.278";FALSE;"emufasha"
>>>
>>> In order to clean up the trackedentitydatavalue table, I need to write
>>> these records to a temporary table, delete them
>>> from trackedentitydatavalue, then decide which of the duplicate pairs to
>>> delete before copying the correct ones back into trackedentitydatavalues.
>>>
>>> This error seems to happen almost daily when the systems are in full use.
>>>
>>> Is there a way we can put a constraint on the trackedentitydatavalue
>>> table so that it will not accept duplicate values for
>>> programstageinstanceid and dataelementid?  I also think there must be an
>>> error in the java code that should be updating the first saved record with
>>> the one entered by the user later rather than inserting a new record.
>>>
>>> This is a major bug because it causes analytics to fail and when it
>>> fails we cannot run any event reports to get data out of the system.
>>>
>>> --
>>> *Randy Wilson*
>>> *Team Leader: **Knowledge Management, Data Use and Research*
>>> Rwanda Health System Strengthening Activity
>>> Management Sciences for Health
>>> Rwanda-Kigali
>>> Direct: +250 788308835
>>> E-mail: rwilson@xxxxxxx
>>> Skype: wilsonrandy_us
>>> <http://www.msh.org/>
>>> Stronger health systems. Greater health impact.
>>> <https://www.facebook.com/ManagementSciencesForHealth>
>>> <https://twitter.com/MSHHealthImpact>
>>> <https://www.youtube.com/user/MSHHealthImpact>
>>> www.msh.org
>>>
>>> *This message and its attachments are confidential and solely for the
>>> intended recipients. If received in error, please delete them and notify
>>> the sender via reply e-mail immediately.*
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>>
>> *Ant Snyman*
>>
>> *Cell: 0824910449*
>>
>> *Landline: 028 2713242*
>>
>>
>> Health Information Systems Program - SA
>>
>> *This message and any attachments are subject to a disclaimer published
>> at http://www.hisp.org/policies.html#comms_disclaimer
>> <http://www.hisp.org/policies.html#comms_disclaimer>.  Please read the
>> disclaimer before opening any attachment or taking any other action in
>> terms of this electronic transmission.  If you cannot access the
>> disclaimer, kindly send an email to disclaimer@xxxxxxxx
>> <disclaimer@xxxxxxxx> and a copy will be provided to you. By replying to
>> this e-mail or opening any attachment you agree to be bound by the
>> provisions of the disclaimer.*
>>
>
> *This message and its attachments are confidential and solely for the
> intended recipients. If received in error, please delete them and notify
> the sender via reply e-mail immediately.*
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp
>
>

PNG image


Follow ups

References