← Back to team overview

dhis2-users team mailing list archive

Re: Importing DHS survey data in DHIS

 

Hi Randy,

currently I am just loading the bare dataSet. But you are right: a normal nightly load run should first update the meta-data and update the dataValues afterwards, otherwise you'd have values being rejected, if they were coded to new OrgUnits or category options that are not yet in DHIS2. However, we are not yet there, but that would be one of the next activities.

Also, as you have stated, the current version expects the categoryOptions to be compliant with those in DHIS2. Mappings have to be done in the custom extractor. As you state it is easier if there is no mapping needed, but from my previous DWH-experiences I know that this is normally desired since analysis data can normally grouped into broader categories than those from the operational systems, thus reducing the number of combinations in the cubes.

Our main benefit for the moment is that the ETL process compares the dataValues to what is already present in the DHIS2 and then decides whether to update an existing value, create a new value or to delete a value that doesn't not come any more (data are extracted in full, but uploaded in pseudo delta). Also the transformation from tabular data to the DHIS2 api-format is done, including the mapping to DHIS2-IDs for category option combinations.

Regards,

Uwe

---

Am 02.02.2016 um 18:04 schrieb Wilson, Randy:
One of the interesting ideas from Uwe's approach is that DHS has apparently standardized definitions for all indicators - presumably there is a code that we can use in DHIS-2 so that interoperability will be simplified. Uwe might want to extend the data element attributes to capture more of the metadata that is available in DHS to define the indicators. Also, I wonder if you plan to bring in the raw data (numerators & denominators) as data elements and build the calculations into DHIS-2, or bring in the calculated indicator values as data elements.

One of the challenges that we face in our Data Warehouse is that it contains indicators calculated based on both routine and population survey data. We have to be very careful of the indicator names so that people know which come from which source. For example: from DHS we have "Contraceptive prevalence rate - modern methods" while we estimate that from the routine HMIS data but call it "Contraceptive utilisation rate from health facilities - modern methods".

Randy



On Tue, Feb 2, 2016 at 4:13 PM, Olav Poppe <olav.poppe@xxxxxx <mailto:olav.poppe@xxxxxx>> wrote:

    Hi Randy and Uwe,
    thanks, interesting to hear you experiences. Uwe, what you are
    working on sounds quite a bit more complicated, and not least with
    far more data. I image that with household surveys, it would be a
    matter of < 100 indicators for < 200 orgunits for 2-3 periods,
    i.e. a fraction of what you are dealing with!

    Olav






    31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@xxxxxxxxx
    <mailto:uwe@xxxxxxxxx>>:

    Hi Olav & Randy,

    I am currently banging on kettle (aka Pentaho DI) to extract data
    from a source-system (SQL-ERP in our case) into DHIS2 dataSets in
    json format. In our current test-scenario (2 dataElements in a
    dataSet with a categoryCombination of 5 categories) we are
    currently updating ca. 4 mio dataValues every night in a
    pseudo-delta mode (reading all data from source, comparing to
    what is there in DHIS2 already, then only pushing records for
    creating, updating or deleting dataValues into the api: ca. 150k
    per night in 1 hour, initial load was 7hrs). We still have to
    prove, that this is feasible when setting up the first real life
    dataSet where there will be more categories and more
    dataElements, thus exploding the number of dataValues.

    Getting there was a bit painful, but now it seems to work. I
    chose kettle instead of Talend ETL (both open source) as it
    seemed to be easier to get used to. However, from a data
    warehouse perspective I'd prefer to have DHIS2 offering some sort
    of an integrated ETL landscape on the long run, which would also
    allow to aggregate data from tracker into dataSets, tracker to
    tracker, dataSets to dataSets etc.

    Our current version of the kettle transformations and jobs were
    designed to be generic (not for a specific dataSet, but you have
    to design your own extractor which could be a simple csv-reader
    or maybe a DHS api-call). If you are interested, I will share
    them. Just be aware that they are currently in a very early and
    rough state and not documented. You'd have to bring along the
    willingness to dig yourself into kettle and be pain resistant to
    a certain degree :-)

    I'd be interested to hear from other experiences ...

    Have a nice sunday,

    Uwe

    ---

    Am 29.01.2016 um 17:31 schrieb Wilson, Randy:

    Not here unfortunately...just doing csv imports from DHS Excel
    files. Would be useful for our data warehouse.
    Randy

    On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@xxxxxx
    <mailto:olav.poppe@xxxxxx>> wrote:

        Hi all,
        I wanted to hear if anyone has any experience with the DHS
        API (http://api.dhsprogram.com/#/index.html), and using it
        to import survey results into DHIS?

        Olav

        _______________________________________________
        Mailing list: https://launchpad.net/~dhis2-users
        <https://launchpad.net/%7Edhis2-users>
        Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
        <mailto:dhis2-users@xxxxxxxxxxxxxxxxxxx>
        Unsubscribe : https://launchpad.net/~dhis2-users
        <https://launchpad.net/%7Edhis2-users>
        More help   : https://help.launchpad.net/ListHelp


    /This message and its attachments are confidential and solely
    for the intended recipients. If received in error, please delete
    them and notify the sender via reply e-mail immediately./


    _______________________________________________
    Mailing list:https://launchpad.net/~dhis2-users
    <https://launchpad.net/%7Edhis2-users>
    Post to     :dhis2-users@xxxxxxxxxxxxxxxxxxx
    <mailto:dhis2-users@xxxxxxxxxxxxxxxxxxx>
    Unsubscribe :https://launchpad.net/~dhis2-users
    <https://launchpad.net/%7Edhis2-users>
    More help   :https://help.launchpad.net/ListHelp





--
*Randy Wilson*
/Team Leader: //Knowledge Management, Data Use and Research/
Rwanda Health System Strengthening Activity
Management Sciences for Health
Rwanda-Kigali
Direct: +250 788308835
E-mail: rwilson@xxxxxxx <mailto:rwilson@xxxxxxx>
Skype: wilsonrandy_us
<http://www.msh.org/>
Stronger health systems. Greater health impact.
<https://www.facebook.com/ManagementSciencesForHealth> <https://twitter.com/MSHHealthImpact> <https://www.youtube.com/user/MSHHealthImpact>
www.msh.org <http://www.msh.org/>

/This message and its attachments are confidential and solely for the intended recipients. If received in error, please delete them and notify the sender via reply e-mail immediately./


References