← Back to team overview

dhis2-users team mailing list archive

Re: Importing DHS survey data in DHIS

 

Hi Alex,

thanks for the suggestions. That's actually the api I am using: Per dataSet I
post one request for deletion, one for creation and one for update in parallel.
Kettle has a transformation for converting tabular data into one json record and
another one for POSTing that json-chunk to the api in one request. I also saw
your curl-observation when sending single values in the beginning, when there
wasn't a DELETE option for the batch and I had to delete on a single
record-basis.

Actually I was surprised that the performance of the api is rather acceptable:
on our server it's roughly 375k records per hour for creating/updating/deleting
(no network delays since kettle is running on the same server as DHIS2 thus
POSTing to localhost). But I am thinking of breaking the load into parallel
packages as you suggested e.g. per dataElement, mainly in order to avoid memory
dumps from kettle - the json-converter is quite hungry. Is DHIS2 able to detect
memory shortages from parallel api-imports without dumping?

Does anyone have experience with more permanent options, like posting CSV to
dataValueSets or using the new ADX api? Actually I'd prefer DHIS2 offering an
api where I can POST a CSV-like structure per dataSet like
[ou,pe,Category1,Category2,etc,DataElement1,DataElement2,etc]. I suppose that
this would reduce the volume of data to be transferred significantly, not sure
about the performance.

Regards,

Uwe

---

> Alex Tumwesigye <atumwesigye@xxxxxxxxx> hat am 2. Februar 2016 um 17:31
> geschrieben:
> 
> 
> Dear Uwe,
> 
> Have you tried to send data via the endpoint api/dataValueSets, it may be
> faster. Just stage your data and push it once.
> http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
> 
> Also to note, is how you send it, I have seen curl taking ages to submit
> individual values via the api. You need to send it as once file via once
> request or implement concurrency.
> 
> Alex
> 
> On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@xxxxxx> wrote:
> 
> > Hi Randy and Uwe,
> > thanks, interesting to hear you experiences. Uwe, what you are working on
> > sounds quite a bit more complicated, and not least with far more data. I
> > image that with household surveys, it would be a matter of < 100 indicators
> > for < 200 orgunits for 2-3 periods, i.e. a fraction of what you are dealing
> > with!
> >
> > Olav
> >
> >
> >
> >
> >
> >
> > 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@xxxxxxxxx>:
> >
> > Hi Olav & Randy,
> >
> > I am currently banging on kettle (aka Pentaho DI) to extract data from a
> > source-system (SQL-ERP in our case) into DHIS2 dataSets in json format. In
> > our current test-scenario (2 dataElements in a dataSet with a
> > categoryCombination of 5 categories) we are currently updating ca. 4 mio
> > dataValues every night in a pseudo-delta mode (reading all data from
> > source, comparing to what is there in DHIS2 already, then only pushing
> > records for creating, updating or deleting dataValues into the api: ca.
> > 150k per night in 1 hour, initial load was 7hrs). We still have to prove,
> > that this is feasible when setting up the first real life dataSet where
> > there will be more categories and more dataElements, thus exploding the
> > number of dataValues.
> >
> > Getting there was a bit painful, but now it seems to work. I chose kettle
> > instead of Talend ETL (both open source) as it seemed to be easier to get
> > used to. However, from a data warehouse perspective I'd prefer to have
> > DHIS2 offering some sort of an integrated ETL landscape on the long run,
> > which would also allow to aggregate data from tracker into dataSets,
> > tracker to tracker, dataSets to dataSets etc.
> >
> > Our current version of the kettle transformations and jobs were designed
> > to be generic (not for a specific dataSet, but you have to design your own
> > extractor which could be a simple csv-reader or maybe a DHS api-call). If
> > you are interested, I will share them. Just be aware that they are
> > currently in a very early and rough state and not documented. You'd have to
> > bring along the willingness to dig yourself into kettle and be pain
> > resistant to a certain degree :-)
> >
> > I'd be interested to hear from other experiences ...
> >
> > Have a nice sunday,
> >
> > Uwe
> >
> > ---
> >
> > Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
> >
> > Not here unfortunately...just doing csv imports from DHS Excel files.
> > Would be useful for our data warehouse.
> > Randy
> > On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@xxxxxx> wrote:
> >
> >> Hi all,
> >> I wanted to hear if anyone has any experience with the DHS API (
> >> http://api.dhsprogram.com/#/index.html), and using it to import survey
> >> results into DHIS?
> >>
> >> Olav
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-users
> >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dhis2-users
> >> More help   : https://help.launchpad.net/ListHelp
> >>
> >>
> > *This message and its attachments are confidential and solely for the
> > intended recipients. If received in error, please delete them and notify
> > the sender via reply e-mail immediately.*
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-users
> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~dhis2-users
> > More help   : https://help.launchpad.net/ListHelp
> >
> >
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-users
> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~dhis2-users
> > More help   : https://help.launchpad.net/ListHelp
> >
> >
> 
> 
> -- 
> Alex Tumwesigye
> 
> Technical Advisor - DHIS2 (Consultant),
> Ministry of Health/AFENET
> Kampala
> Uganda
> 
> IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
> 
> IT Specialist (Servers, Networks and Security, Health Information Systems -
> DHIS2 ) & Solar Consultant
> 
> +256 774149 775, + 256 759 800161
> 
> "I don't want to be anything other than what I have been - one tree hill "


References