← Back to team overview

dhis2-users team mailing list archive

Re: Serious problem with duplicate records in Individual records/tracker

 

Hi all,

It seems to me from the pgadmin create table script below that the table
trackedentitydatavalue table does not have a unique primary key constraint
(programstageinstanceid+dataelementid).  I theory adding that constraint
would make it impossible to have duplicates.

here is the extra constraint needed:

CONSTRAINT pk_programstageinstanceid_dataelementid PRIMARY KEY
(programstageinstanceid, dataelementid)

I hesitate to do it on our production instance because I wonder if the Java
code should be checked to make sure the system will recover gracefully if a
user tries to save the same record twice within a few seconds due to the
latency issues discussed below.


Is that a change that can be tried by the developers and then moved into
trunk?


current structure of table:
_______________________________

CREATE TABLE trackedentitydatavalue
(
  programstageinstanceid integer NOT NULL,
  dataelementid integer NOT NULL,
  value character varying(255),
  "timestamp" timestamp without time zone,
  providedelsewhere boolean,
  storedby character varying(31),
  CONSTRAINT fk_entityinstancedatavalue_dataelementid FOREIGN KEY
(dataelementid)
      REFERENCES dataelement (dataelementid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk_entityinstancedatavalue_programstageinstanceid FOREIGN KEY
(programstageinstanceid)
      REFERENCES programstageinstance (programstageinstanceid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);
ALTER TABLE trackedentitydatavalue
  OWNER TO individualrecords;

Thanks,

Randy


On Fri, Oct 9, 2015 at 6:35 PM, Knut Staring <knutst@xxxxxxxxx> wrote:

> Seems like what we also experienced a lot for tLiberia, I think.
>
> Knut
>
> On Fri, Oct 9, 2015 at 11:13 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
>
>> If you experience reasonably high latency (slow response times) it seems
>> quite likely you might well get duplicates. Probably related to
>>
>> https://bugs.launchpad.net/dhis2/+bug/1498505
>>
>> On 9 October 2015 at 15:40, Wilson, Randy <rwilson@xxxxxxx> wrote:
>>
>>> I will check but when i remove the duplicates analytics runs. Also
>>> usually we get many more duplicates 100 or more each time I check.
>>>
>>> Randy
>>> On Oct 9, 2015 3:48 PM, "Ant Snyman" <ant@xxxxxxxx> wrote:
>>>
>>>> Hi Randy,
>>>> There is actually a constraint on the trackedentitydatavalue table and
>>>> you cannot create two entries with the same programstageinstance and
>>>> dataelementid. I notice that you are grouping on dataelement name, is it
>>>> possible that you could have two dataelements with the same name in your
>>>> dataelement table? I.e different dataelementids but the same name?
>>>> Regards
>>>> Ant
>>>>
>>>> On 9 October 2015 at 10:22, Wilson, Randy <rwilson@xxxxxxx> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We have an big issue with several of Rwanda's event reporting systems
>>>>> that have been set up using Tracker/Individual records.  For some reason,
>>>>> users are able to create duplicate records in the system through normal
>>>>> data entry.  It seems to happen when they enter a value for an attribute
>>>>> and then a couple of seconds or milliseconds later they change that value.
>>>>> The system then stores both records with last updated several milliseconds
>>>>> apart.
>>>>>
>>>>> This is a big issue because analytics fails to run when it finds these
>>>>> duplicates.
>>>>>
>>>>> Here is the query that finds the duplicates:
>>>>>
>>>>> select programstageinstanceid, dataelementname, count(*) as numrecs
>>>>> from
>>>>> (
>>>>> SELECT
>>>>>   program.name AS programname,
>>>>>   program.description,
>>>>>   programinstance.dateofincident,
>>>>>   programinstance.trackedentityinstanceid,
>>>>>   programinstance.status,
>>>>>   programstageinstance.programstageinstanceid,
>>>>>   programstageinstance.programinstanceid,
>>>>>   programstageinstance.programstageid,
>>>>>   trackedentitydatavalue.value,
>>>>>   dataelement.name AS dataelementname
>>>>> FROM
>>>>>   public.programstageinstance,
>>>>>   public.programinstance,
>>>>>   public.program,
>>>>>   public.trackedentitydatavalue,
>>>>>   public.dataelement
>>>>> WHERE
>>>>>   programstageinstance.programinstanceid =
>>>>> programinstance.programinstanceid AND
>>>>>   program.programid = programinstance.programid AND
>>>>>   trackedentitydatavalue.programstageinstanceid =
>>>>> programstageinstance.programstageinstanceid AND
>>>>>   dataelement.dataelementid = trackedentitydatavalue.dataelementid
>>>>>   ) as foo
>>>>>   group by programstageinstanceid, dataelementname
>>>>>   having count(*) >1
>>>>>   order by numrecs desc;
>>>>>
>>>>> Here is the typical output of the query when there are duplicate
>>>>> records:
>>>>>
>>>>> Programstageinstanceid, dataelementname,numrecs
>>>>> 777921;"TB Sample type_morning Sputum";2
>>>>> 776886;"TB Date of collection_sample";2
>>>>> 773132;"TB History of contact with TPB+ case";2
>>>>>
>>>>> Here is an example of the duplicates we see when we select the
>>>>> duplicated records from the trackedentitydatavalue table.
>>>>>
>>>>> prograstageinstanceid,
>>>>> dataelementid,value,lastupdated,providedelsewhere,user
>>>>> 773132;35105;"true";"2015-10-02 09:32:48.334";FALSE;"abibarimana"
>>>>> 773132;35105;"true";"2015-10-02 09:32:48.508";FALSE;"abibarimana"
>>>>> 776886;31879;"2015-07-17";"2015-10-07 16:25:26.341";FALSE;"mugeni"
>>>>> 776886;31879;"2015-07-14";"2015-10-07 16:25:26.546";FALSE;"mugeni"
>>>>> 777921;35106;"true";"2015-10-08 09:55:36.133";FALSE;"emufasha"
>>>>> 777921;35106;"true";"2015-10-08 09:55:36.278";FALSE;"emufasha"
>>>>>
>>>>> In order to clean up the trackedentitydatavalue table, I need to write
>>>>> these records to a temporary table, delete them
>>>>> from trackedentitydatavalue, then decide which of the duplicate pairs to
>>>>> delete before copying the correct ones back into trackedentitydatavalues.
>>>>>
>>>>> This error seems to happen almost daily when the systems are in full
>>>>> use.
>>>>>
>>>>> Is there a way we can put a constraint on the trackedentitydatavalue
>>>>> table so that it will not accept duplicate values for
>>>>> programstageinstanceid and dataelementid?  I also think there must be an
>>>>> error in the java code that should be updating the first saved record with
>>>>> the one entered by the user later rather than inserting a new record.
>>>>>
>>>>> This is a major bug because it causes analytics to fail and when it
>>>>> fails we cannot run any event reports to get data out of the system.
>>>>>
>>>>> --
>>>>> *Randy Wilson*
>>>>> *Team Leader: **Knowledge Management, Data Use and Research*
>>>>> Rwanda Health System Strengthening Activity
>>>>> Management Sciences for Health
>>>>> Rwanda-Kigali
>>>>> Direct: +250 788308835
>>>>> E-mail: rwilson@xxxxxxx
>>>>> Skype: wilsonrandy_us
>>>>> <http://www.msh.org/>
>>>>> Stronger health systems. Greater health impact.
>>>>> <https://www.facebook.com/ManagementSciencesForHealth>
>>>>> <https://twitter.com/MSHHealthImpact>
>>>>> <https://www.youtube.com/user/MSHHealthImpact>
>>>>> www.msh.org
>>>>>
>>>>> *This message and its attachments are confidential and solely for the
>>>>> intended recipients. If received in error, please delete them and notify
>>>>> the sender via reply e-mail immediately.*
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Ant Snyman*
>>>>
>>>> *Cell: 0824910449*
>>>>
>>>> *Landline: 028 2713242*
>>>>
>>>>
>>>> Health Information Systems Program - SA
>>>>
>>>> *This message and any attachments are subject to a disclaimer published
>>>> at http://www.hisp.org/policies.html#comms_disclaimer
>>>> <http://www.hisp.org/policies.html#comms_disclaimer>.  Please read the
>>>> disclaimer before opening any attachment or taking any other action in
>>>> terms of this electronic transmission.  If you cannot access the
>>>> disclaimer, kindly send an email to disclaimer@xxxxxxxx
>>>> <disclaimer@xxxxxxxx> and a copy will be provided to you. By replying to
>>>> this e-mail or opening any attachment you agree to be bound by the
>>>> provisions of the disclaimer.*
>>>>
>>>
>>> *This message and its attachments are confidential and solely for the
>>> intended recipients. If received in error, please delete them and notify
>>> the sender via reply e-mail immediately.*
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Knut Staring
> Dept. of Informatics, University of Oslo
> Norway: +4791880522
> Skype: knutstar
> http://dhis2.org
>



-- 
*Randy Wilson*
*Team Leader: **Knowledge Management, Data Use and Research*
Rwanda Health System Strengthening Activity
Management Sciences for Health
Rwanda-Kigali
Direct: +250 788308835
E-mail: rwilson@xxxxxxx
Skype: wilsonrandy_us
<http://www.msh.org/>
Stronger health systems. Greater health impact.
<https://www.facebook.com/ManagementSciencesForHealth>
<https://twitter.com/MSHHealthImpact>
<https://www.youtube.com/user/MSHHealthImpact>
www.msh.org

-- 
*This message and its attachments are confidential and solely for the 
intended recipients. If received in error, please delete them and notify 
the sender via reply e-mail immediately.*

PNG image


Follow ups

References