← Back to team overview

dhis2-users team mailing list archive

Re: Cross database querying for updating data

 

Couldn't two "left joins with null" give you the diff?
On Apr 21, 2015 12:36 AM, "Jason Pickering" <jason.p.pickering@xxxxxxxxx>
wrote:

> Hi there. I think the big challenge here is what is not there, as opposed
> to what needs to be added.   DHIS2 has no functionality for "soft deletes",
> meaning when you mark a record with a column like "deletedon" with a
> timestamp for deletion, but you actually do not remove it from the
> database. This is a common problem with DXF syncronizations, as the
> upstream system does not transmit DELETES. It only transmits INSERTS and
> UPDATES.
>
> At first glance, in your case, it would seem to be pretty easy, since you
> can use the "lastupdated" of the data value table to parse out values which
> have been added since your last synchronization. The problem however is to
> determine what has been DELETED from the data value table.Well, that is
> more tricky, and would require you diff both data values tables to
> determine what changes you need to transmit to your downstream system.
>
> There are tools out there, to help determine the "diff" between two
> databases, which I think in your case would require you to have copy of
> your remote DB and then diff it against your local DB. You would then
> prepare a diff, make the update to your local copy and then you remote
> copy.
>
> I think that would really be the only real way of doing this, using a tool
> like perhaps these (
> http://stackoverflow.com/questions/4804779/how-to-check-difference-between-two-databases-in-p)
> other other commercial alternatives to diff the two databases, and prepare
> a series of INSERT and DELETE statements. Never attempted it myself, but it
> might work.
>
> Does not seem really that trivial. :(
>
> Regards,
> Jason
>
>
> On Mon, Apr 20, 2015 at 6:59 PM Knut Staring <knutst@xxxxxxxxx> wrote:
>
>> I think I would trust UIDs or Codes more than database IDs
>>
>> On Mon, Apr 20, 2015 at 10:48 PM, Muyepa A. <muyepaa@xxxxxxxxx> wrote:
>>
>>> Indeed Database A has extra data not available in B while  B has data
>>> missing in A.  No changes have been made to orgunits, dataelements,
>>> periods or users. The delta is in data-values. I would really want to get
>>> views on whether the operation described (all other parameters being equal)
>>> will bring the missing data into A, and whether it may affect report
>>> summary rates.
>>> I cannot easily download the big databases. Something
>>> on command-line would be very suitable for this circumstance  and
>>> transparent to users.
>>>
>>> On Mon, Apr 20, 2015 at 5:08 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>> wrote:
>>>
>>>> If database B is restored off a backup of database A then it will
>>>> probably work with some caveats.  Clearly they are not exactly the
>>>> same because you've got extra data in A which you are trying to push
>>>> into B.  So it depends really on what else might have changed in A.
>>>>
>>>> If you have made any changes to orgunits, datelements or periods then
>>>> that will cause at best a database integrity violation when you try to
>>>> import that way.  At worst cause a very horrible mess.
>>>>
>>>> Even though the process is fragile, there is a related use case where
>>>> one might need to do such a thing or something similar.  If a country
>>>> system is hosted somewhere in the cloud (because they have poor or
>>>> expensive local bandwidth) then there is a real problem to solve about
>>>> getting data backups from the server back into the country.
>>>>
>>>> In particular we'd like to be able to pull not all the datavalues but
>>>> only those which have been changed ie. do an incremental backup of the
>>>> datavalues table.  The same could apply to events.
>>>>
>>>> One possible approach might be to first do a metadata backup and
>>>> restore.  And then pull the changed datavalues - though you would have
>>>> to do this through the api with uids or codes as the metadata backup
>>>> and restore would have ignored the database identifiers.
>>>>
>>>> Yet another approach would be to take a reduced postgres backup
>>>> excluding the datavalues table, restore that and then try and pull in
>>>> the delta of datavalues table something like you describe.
>>>>
>>>> Its a tough problem.  Jason, what do you suggest?  Downloading the
>>>> full postgres backup (with ephemeral tables like analytics excluded)
>>>> can be a serious challenge for many countries to do with regularity
>>>> and reliability.  So how best to go about incremental backup and
>>>> restore of data?
>>>>
>>>> On 20 April 2015 at 15:36, Muyepa A. <muyepaa@xxxxxxxxx> wrote:
>>>> > I had thought of  data import/export functionality, however this will
>>>> > require two instances of tomcat to be running. (or one instance with
>>>> one
>>>> > database at a time, however there will be an associated down time).
>>>> > Since upgrading to 2.18, when exported, the resultant download zip
>>>> file says
>>>> > invalid for all formats (xml, json, excel) and is only 2kb in size
>>>> > regardless of the hierarchy I choose.
>>>> >
>>>> > Since one database is copy of the other, i thought I could easily
>>>> update
>>>> > from the other as described.
>>>> >
>>>> > On Mon, Apr 20, 2015 at 3:48 PM, Jason Pickering
>>>> > <jason.p.pickering@xxxxxxxxx> wrote:
>>>> >>
>>>> >> It may work, but in general, it will not because the internal IDs of
>>>> >> different DHIS2 databases are not the same. So, it might work, but it
>>>> >> requires both databases to be in essentially the same state. Better
>>>> to use
>>>> >> the data import/export functionality if you can.
>>>> >>
>>>> >> Regards,
>>>> >> Jason
>>>> >>
>>>> >>
>>>> >> On Mon, Apr 20, 2015 at 3:28 PM Muyepa A. <muyepaa@xxxxxxxxx> wrote:
>>>> >>>
>>>> >>> Can I safely update DHIS2 database A with data from DHIS database B
>>>> both
>>>> >>> on the same PostgreSQL server by cross query. B is backup of A
>>>> >>>
>>>> >>> From database B, I can retrieve and save the data that is required
>>>> using:
>>>> >>>
>>>> >>> psql B -c "\copy (select * from datavalue where  periodid='xxxx') TO
>>>> >>> STDOUT" > /tmp/data.tsv;
>>>> >>>
>>>> >>>
>>>> >>> And to restore:
>>>> >>> psql A -c "\copy datavaue (x, y, z, ...) FROM /tmp/data.tsv"
>>>> >>>
>>>> >>>
>>>> >>> Is this recommended, and what else should be cross  transferred.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> Mailing list: https://launchpad.net/~dhis2-users
>>>> >>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>> >>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>> >>> More help   : https://help.launchpad.net/ListHelp
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Mailing list: https://launchpad.net/~dhis2-users
>>>> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>> > Unsubscribe : https://launchpad.net/~dhis2-users
>>>> > More help   : https://help.launchpad.net/ListHelp
>>>> >
>>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Knut Staring
>> Dept. of Informatics, University of Oslo
>> Norway: +4791880522
>> Skype: knutstar
>> http://dhis2.org
>>  _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help   : https://help.launchpad.net/ListHelp
>>
>

Follow ups

References