dhis2-users team mailing list archive
-
dhis2-users team
-
Mailing list archive
-
Message #08375
Re: Serious problem with duplicate records in Individual records/tracker
Seems like a good idea to me - this is certainly a serious problem. Hope it
can be part of 2.21.
Knut
On Thu, Oct 15, 2015 at 5:33 AM, Wilson, Randy <rwilson@xxxxxxx> wrote:
> Hi all,
>
> It seems to me from the pgadmin create table script below that the table
> trackedentitydatavalue table does not have a unique primary key constraint
> (programstageinstanceid+dataelementid). I theory adding that constraint
> would make it impossible to have duplicates.
>
> here is the extra constraint needed:
>
> CONSTRAINT pk_programstageinstanceid_dataelementid PRIMARY KEY
> (programstageinstanceid, dataelementid)
>
> I hesitate to do it on our production instance because I wonder if the
> Java code should be checked to make sure the system will recover gracefully
> if a user tries to save the same record twice within a few seconds due to
> the latency issues discussed below.
>
>
> Is that a change that can be tried by the developers and then moved into
> trunk?
>
>
> current structure of table:
> _______________________________
>
> CREATE TABLE trackedentitydatavalue
> (
> programstageinstanceid integer NOT NULL,
> dataelementid integer NOT NULL,
> value character varying(255),
> "timestamp" timestamp without time zone,
> providedelsewhere boolean,
> storedby character varying(31),
> CONSTRAINT fk_entityinstancedatavalue_dataelementid FOREIGN KEY
> (dataelementid)
> REFERENCES dataelement (dataelementid) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT fk_entityinstancedatavalue_programstageinstanceid FOREIGN KEY
> (programstageinstanceid)
> REFERENCES programstageinstance (programstageinstanceid) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
> ALTER TABLE trackedentitydatavalue
> OWNER TO individualrecords;
>
> Thanks,
>
> Randy
>
>
> On Fri, Oct 9, 2015 at 6:35 PM, Knut Staring <knutst@xxxxxxxxx> wrote:
>
>> Seems like what we also experienced a lot for tLiberia, I think.
>>
>> Knut
>>
>> On Fri, Oct 9, 2015 at 11:13 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>> wrote:
>>
>>> If you experience reasonably high latency (slow response times) it seems
>>> quite likely you might well get duplicates. Probably related to
>>>
>>> https://bugs.launchpad.net/dhis2/+bug/1498505
>>>
>>> On 9 October 2015 at 15:40, Wilson, Randy <rwilson@xxxxxxx> wrote:
>>>
>>>> I will check but when i remove the duplicates analytics runs. Also
>>>> usually we get many more duplicates 100 or more each time I check.
>>>>
>>>> Randy
>>>> On Oct 9, 2015 3:48 PM, "Ant Snyman" <ant@xxxxxxxx> wrote:
>>>>
>>>>> Hi Randy,
>>>>> There is actually a constraint on the trackedentitydatavalue table and
>>>>> you cannot create two entries with the same programstageinstance and
>>>>> dataelementid. I notice that you are grouping on dataelement name, is it
>>>>> possible that you could have two dataelements with the same name in your
>>>>> dataelement table? I.e different dataelementids but the same name?
>>>>> Regards
>>>>> Ant
>>>>>
>>>>> On 9 October 2015 at 10:22, Wilson, Randy <rwilson@xxxxxxx> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> We have an big issue with several of Rwanda's event reporting systems
>>>>>> that have been set up using Tracker/Individual records. For some reason,
>>>>>> users are able to create duplicate records in the system through normal
>>>>>> data entry. It seems to happen when they enter a value for an attribute
>>>>>> and then a couple of seconds or milliseconds later they change that value.
>>>>>> The system then stores both records with last updated several milliseconds
>>>>>> apart.
>>>>>>
>>>>>> This is a big issue because analytics fails to run when it finds
>>>>>> these duplicates.
>>>>>>
>>>>>> Here is the query that finds the duplicates:
>>>>>>
>>>>>> select programstageinstanceid, dataelementname, count(*) as numrecs
>>>>>> from
>>>>>> (
>>>>>> SELECT
>>>>>> program.name AS programname,
>>>>>> program.description,
>>>>>> programinstance.dateofincident,
>>>>>> programinstance.trackedentityinstanceid,
>>>>>> programinstance.status,
>>>>>> programstageinstance.programstageinstanceid,
>>>>>> programstageinstance.programinstanceid,
>>>>>> programstageinstance.programstageid,
>>>>>> trackedentitydatavalue.value,
>>>>>> dataelement.name AS dataelementname
>>>>>> FROM
>>>>>> public.programstageinstance,
>>>>>> public.programinstance,
>>>>>> public.program,
>>>>>> public.trackedentitydatavalue,
>>>>>> public.dataelement
>>>>>> WHERE
>>>>>> programstageinstance.programinstanceid =
>>>>>> programinstance.programinstanceid AND
>>>>>> program.programid = programinstance.programid AND
>>>>>> trackedentitydatavalue.programstageinstanceid =
>>>>>> programstageinstance.programstageinstanceid AND
>>>>>> dataelement.dataelementid = trackedentitydatavalue.dataelementid
>>>>>> ) as foo
>>>>>> group by programstageinstanceid, dataelementname
>>>>>> having count(*) >1
>>>>>> order by numrecs desc;
>>>>>>
>>>>>> Here is the typical output of the query when there are duplicate
>>>>>> records:
>>>>>>
>>>>>> Programstageinstanceid, dataelementname,numrecs
>>>>>> 777921;"TB Sample type_morning Sputum";2
>>>>>> 776886;"TB Date of collection_sample";2
>>>>>> 773132;"TB History of contact with TPB+ case";2
>>>>>>
>>>>>> Here is an example of the duplicates we see when we select the
>>>>>> duplicated records from the trackedentitydatavalue table.
>>>>>>
>>>>>> prograstageinstanceid,
>>>>>> dataelementid,value,lastupdated,providedelsewhere,user
>>>>>> 773132;35105;"true";"2015-10-02 09:32:48.334";FALSE;"abibarimana"
>>>>>> 773132;35105;"true";"2015-10-02 09:32:48.508";FALSE;"abibarimana"
>>>>>> 776886;31879;"2015-07-17";"2015-10-07 16:25:26.341";FALSE;"mugeni"
>>>>>> 776886;31879;"2015-07-14";"2015-10-07 16:25:26.546";FALSE;"mugeni"
>>>>>> 777921;35106;"true";"2015-10-08 09:55:36.133";FALSE;"emufasha"
>>>>>> 777921;35106;"true";"2015-10-08 09:55:36.278";FALSE;"emufasha"
>>>>>>
>>>>>> In order to clean up the trackedentitydatavalue table, I need to
>>>>>> write these records to a temporary table, delete them
>>>>>> from trackedentitydatavalue, then decide which of the duplicate pairs to
>>>>>> delete before copying the correct ones back into trackedentitydatavalues.
>>>>>>
>>>>>> This error seems to happen almost daily when the systems are in full
>>>>>> use.
>>>>>>
>>>>>> Is there a way we can put a constraint on the trackedentitydatavalue
>>>>>> table so that it will not accept duplicate values for
>>>>>> programstageinstanceid and dataelementid? I also think there must be an
>>>>>> error in the java code that should be updating the first saved record with
>>>>>> the one entered by the user later rather than inserting a new record.
>>>>>>
>>>>>> This is a major bug because it causes analytics to fail and when it
>>>>>> fails we cannot run any event reports to get data out of the system.
>>>>>>
>>>>>> --
>>>>>> *Randy Wilson*
>>>>>> *Team Leader: **Knowledge Management, Data Use and Research*
>>>>>> Rwanda Health System Strengthening Activity
>>>>>> Management Sciences for Health
>>>>>> Rwanda-Kigali
>>>>>> Direct: +250 788308835
>>>>>> E-mail: rwilson@xxxxxxx
>>>>>> Skype: wilsonrandy_us
>>>>>> <http://www.msh.org/>
>>>>>> Stronger health systems. Greater health impact.
>>>>>> <https://www.facebook.com/ManagementSciencesForHealth>
>>>>>> <https://twitter.com/MSHHealthImpact>
>>>>>> <https://www.youtube.com/user/MSHHealthImpact>
>>>>>> www.msh.org
>>>>>>
>>>>>> *This message and its attachments are confidential and solely for the
>>>>>> intended recipients. If received in error, please delete them and notify
>>>>>> the sender via reply e-mail immediately.*
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>>>> Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Ant Snyman*
>>>>>
>>>>> *Cell: 0824910449*
>>>>>
>>>>> *Landline: 028 2713242*
>>>>>
>>>>>
>>>>> Health Information Systems Program - SA
>>>>>
>>>>> *This message and any attachments are subject to a disclaimer
>>>>> published at http://www.hisp.org/policies.html#comms_disclaimer
>>>>> <http://www.hisp.org/policies.html#comms_disclaimer>. Please read the
>>>>> disclaimer before opening any attachment or taking any other action in
>>>>> terms of this electronic transmission. If you cannot access the
>>>>> disclaimer, kindly send an email to disclaimer@xxxxxxxx
>>>>> <disclaimer@xxxxxxxx> and a copy will be provided to you. By replying to
>>>>> this e-mail or opening any attachment you agree to be bound by the
>>>>> provisions of the disclaimer.*
>>>>>
>>>>
>>>> *This message and its attachments are confidential and solely for the
>>>> intended recipients. If received in error, please delete them and notify
>>>> the sender via reply e-mail immediately.*
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>> Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Knut Staring
>> Dept. of Informatics, University of Oslo
>> Norway: +4791880522
>> Skype: knutstar
>> http://dhis2.org
>>
>
>
>
> --
> *Randy Wilson*
> *Team Leader: **Knowledge Management, Data Use and Research*
> Rwanda Health System Strengthening Activity
> Management Sciences for Health
> Rwanda-Kigali
> Direct: +250 788308835
> E-mail: rwilson@xxxxxxx
> Skype: wilsonrandy_us
> <http://www.msh.org/>
> Stronger health systems. Greater health impact.
> <https://www.facebook.com/ManagementSciencesForHealth>
> <https://twitter.com/MSHHealthImpact>
> <https://www.youtube.com/user/MSHHealthImpact>
> www.msh.org
>
> *This message and its attachments are confidential and solely for the
> intended recipients. If received in error, please delete them and notify
> the sender via reply e-mail immediately.*
>
--
Knut Staring
Dept. of Informatics, University of Oslo
Norway: +4791880522
Skype: knutstar
http://dhis2.org
References