← Back to team overview

dhis2-users team mailing list archive

Re: Importing DHS survey data in DHIS

 

Hi Lars,

thanks for the hint. Currently I am just running on standards. I'll do a bit of
monitoring first before adjusting those values. I'll post the difference.

Regards,

Uwe

---
> Lars Helge Øverland <larshelge@xxxxxxxxx> hat am 2. Februar 2016 um 23:43
> geschrieben:
> 
> 
> Hi Uwe,
> 
> make sure that you have tuned Postgres properly through postgresql.conf,
> especially the last 5 settings are crucial for getting good write
> performance.
> 
> http://dhis2.github.io/dhis2-docs/master/en/implementer/html/ch08s03.html#d5e464
> 
> checkpoint_segments = 32
> 
> PostgreSQL writes new transactions to a log file called WAL segments which
> are 16MB in size. When a number of segments have been written a checkpoint
> occurs. Setting this number to a larger value will thus improve performance
> for write-heavy systems such as DHIS 2.
> 
> checkpoint_completion_target = 0.8
> 
> Determines the percentage of segment completion before a checkpoint occurs.
> Setting this to a high value will thus spread the writes out and lower the
> average write overhead.
> 
> wal_buffers = 16MB
> 
> Sets the memory used for buffering during the WAL write process. Increasing
> this value might improve throughput in write-heavy systems.
> 
> synchronous_commit = off
> 
> Specifies whether transaction commits will wait for WAL records to be
> written to the disk before returning to the client or not. Setting this to
> off will improve performance considerably. It also implies that there is a
> slight delay between the transaction is reported successful to the client
> and it actually being safe, but the database state cannot be corrupted and
> this is a good alternative for performance-intensive and write-heavy
> systems like DHIS 2.
> 
> wal_writer_delay = 10000ms
> 
> Specifies the delay between WAL write operations. Setting this to a high
> value will improve performance on write-heavy systems since potentially
> many write operations can be executed within a single flush to disk.
> 
> 
> 
> 
> 
> 
> 
> On Tue, Feb 2, 2016 at 9:18 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> 
> > Lars is right that ADX won't be faster than dxf.  Both because it
> > internally converts to dxf on import and because it abstracts away the
> > categoryoptioncombo.  The first isn't really very costly but the other
> > is.
> >
> > This means that that the two systems only have to match categories and
> > categoryoptions which is a much easier mapping to maintain.
> >
> > But if you need raw speed it is going to be faster to produce dxf
> > style categoryoptioncombos as that is closest to the way the data gets
> > stored.  I am going to speed up the adx import code, but will still
> > always be slower :-)
> >
> > On 2 February 2016 at 20:07, uwe wahser <uwe@xxxxxxxxx> wrote:
> > > Ok, that sounds like ADX might even be a bit slower eventually, if the
> > > transformation process outweighs a potentially reduced datavolume. I
> > might
> > > just stick with the json.
> > >
> > > @Jason: I also thought about SQL-Injection shortly, but I am fearing
> > > internal changes of the data-model, which I'd have to understand fully in
> > > the first place. Of course the api's also change more than I expected,
> > but
> > > at least that is announced :-D
> > >
> > > Uwe
> > >
> > > ---
> > >
> > >
> > >
> > > Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:
> > >
> > > Hi Uwe,
> > >
> > > ADX will not be faster than DXF, as for ADX, the stream is first
> > converted
> > > into DXF and then passed on to the regular importer.
> > >
> > > Lars
> > >
> > > On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering
> > > <jason.p.pickering@xxxxxxxxx> wrote:
> > >>
> > >> This was a very trivial lab test,so not really conclusive at all.  I
> > would
> > >> just give it a try and see. If you see differences, please let the devs
> > >> know.
> > >>
> > >> Given the scale of what you are attempting, have you considered using
> > >> direct SQL injection? Not that I am recommending that route as there are
> > >> many pitfalls, but it might be an option if implemented properly,
> > especially
> > >> considering your reported architecture.
> > >>
> > >> Regards
> > >> Jason
> > >>
> > >>
> > >> On Tue, Feb 2, 2016, 17:04 Uwe Wahser <uwe@xxxxxxxxx> wrote:
> > >>>
> > >>> Hi Jason,
> > >>>
> > >>> thanks for sharing the links. As I can see on a quick glance, you are
> > >>> also
> > >>> experimenting with the ADX-api - did you observe any significant
> > >>> performance
> > >>> differences between ADX and dataValueSets apis?
> > >>>
> > >>> Regards,
> > >>>
> > >>> Uwe
> > >>>
> > >>> > Jason Pickering <jason.p.pickering@xxxxxxxxx> hat am 2. Februar
> > 2016 um
> > >>> > 18:21
> > >>> > geschrieben:
> > >>> >
> > >>> >
> > >>> > Hi Olav,
> > >>> > I have not worked with the DHS API per se, but have imported lots of
> > >>> > data
> > >>> > using the same approach which they outline here (
> > >>> > http://api.dhsprogram.com/#/samples-r.cfm)
> > >>> >
> > >>> > I have written up a walkthrough of getting data out of one DHIS
> > >>> > instance
> > >>> > and into another one, and I think the basic principles would be the
> > >>> > same (
> > >>> > http://rpubs.com/jason_p_pickering/139589)
> > >>> >
> > >>> > Metadata needs to be mapped (or created), the data needs to be
> > >>> > reshaped,
> > >>> > and correctly formatted.
> > >>> >
> > >>> > It should not be too difficult. I used R, but there are other
> > examples
> > >>> > with
> > >>> > Python and JavaScript on their examples page.
> > >>> >
> > >>> > Regards,
> > >>> > Jason
> > >>> >
> > >>> >
> > >>> > On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye <
> > atumwesigye@xxxxxxxxx>
> > >>> > wrote:
> > >>> >
> > >>> > > Dear Uwe,
> > >>> > >
> > >>> > > Have you tried to send data via the endpoint api/dataValueSets, it
> > >>> > > may be
> > >>> > > faster. Just stage your data and push it once.
> > >>> > >
> > >>> > >
> > >>> > >
> > http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
> > >>> > >
> > >>> > > Also to note, is how you send it, I have seen curl taking ages to
> > >>> > > submit
> > >>> > > individual values via the api. You need to send it as once file via
> > >>> > > once
> > >>> > > request or implement concurrency.
> > >>> > >
> > >>> > > Alex
> > >>> > >
> > >>> > > On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@xxxxxx>
> > wrote:
> > >>> > >
> > >>> > >> Hi Randy and Uwe,
> > >>> > >> thanks, interesting to hear you experiences. Uwe, what you are
> > >>> > >> working on
> > >>> > >> sounds quite a bit more complicated, and not least with far more
> > >>> > >> data. I
> > >>> > >> image that with household surveys, it would be a matter of < 100
> > >>> > >> indicators
> > >>> > >> for < 200 orgunits for 2-3 periods, i.e. a fraction of what you
> > are
> > >>> > >> dealing
> > >>> > >> with!
> > >>> > >>
> > >>> > >> Olav
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@xxxxxxxxx>:
> > >>> > >>
> > >>> > >> Hi Olav & Randy,
> > >>> > >>
> > >>> > >> I am currently banging on kettle (aka Pentaho DI) to extract data
> > >>> > >> from a
> > >>> > >> source-system (SQL-ERP in our case) into DHIS2 dataSets in json
> > >>> > >> format. In
> > >>> > >> our current test-scenario (2 dataElements in a dataSet with a
> > >>> > >> categoryCombination of 5 categories) we are currently updating
> > ca. 4
> > >>> > >> mio
> > >>> > >> dataValues every night in a pseudo-delta mode (reading all data
> > from
> > >>> > >> source, comparing to what is there in DHIS2 already, then only
> > >>> > >> pushing
> > >>> > >> records for creating, updating or deleting dataValues into the
> > api:
> > >>> > >> ca.
> > >>> > >> 150k per night in 1 hour, initial load was 7hrs). We still have to
> > >>> > >> prove,
> > >>> > >> that this is feasible when setting up the first real life dataSet
> > >>> > >> where
> > >>> > >> there will be more categories and more dataElements, thus
> > exploding
> > >>> > >> the
> > >>> > >> number of dataValues.
> > >>> > >>
> > >>> > >> Getting there was a bit painful, but now it seems to work. I chose
> > >>> > >> kettle
> > >>> > >> instead of Talend ETL (both open source) as it seemed to be easier
> > >>> > >> to get
> > >>> > >> used to. However, from a data warehouse perspective I'd prefer to
> > >>> > >> have
> > >>> > >> DHIS2 offering some sort of an integrated ETL landscape on the
> > long
> > >>> > >> run,
> > >>> > >> which would also allow to aggregate data from tracker into
> > dataSets,
> > >>> > >> tracker to tracker, dataSets to dataSets etc.
> > >>> > >>
> > >>> > >> Our current version of the kettle transformations and jobs were
> > >>> > >> designed
> > >>> > >> to be generic (not for a specific dataSet, but you have to design
> > >>> > >> your own
> > >>> > >> extractor which could be a simple csv-reader or maybe a DHS
> > >>> > >> api-call). If
> > >>> > >> you are interested, I will share them. Just be aware that they are
> > >>> > >> currently in a very early and rough state and not documented.
> > You'd
> > >>> > >> have to
> > >>> > >> bring along the willingness to dig yourself into kettle and be
> > pain
> > >>> > >> resistant to a certain degree :-)
> > >>> > >>
> > >>> > >> I'd be interested to hear from other experiences ...
> > >>> > >>
> > >>> > >> Have a nice sunday,
> > >>> > >>
> > >>> > >> Uwe
> > >>> > >>
> > >>> > >> ---
> > >>> > >>
> > >>> > >> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
> > >>> > >>
> > >>> > >> Not here unfortunately...just doing csv imports from DHS Excel
> > >>> > >> files.
> > >>> > >> Would be useful for our data warehouse.
> > >>> > >> Randy
> > >>> > >> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@xxxxxx> wrote:
> > >>> > >>
> > >>> > >>> Hi all,
> > >>> > >>> I wanted to hear if anyone has any experience with the DHS API (
> > >>> > >>> http://api.dhsprogram.com/#/index.html), and using it to import
> > >>> > >>> survey
> > >>> > >>> results into DHIS?
> > >>> > >>>
> > >>> > >>> Olav
> > >>> > >>>
> > >>> > >>> _______________________________________________
> > >>> > >>> Mailing list: https://launchpad.net/~dhis2-users
> > >>> > >>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > >>> > >>> Unsubscribe : https://launchpad.net/~dhis2-users
> > >>> > >>> More help   : https://help.launchpad.net/ListHelp
> > >>> > >>>
> > >>> > >>>
> > >>> > >> *This message and its attachments are confidential and solely for
> > >>> > >> the
> > >>> > >> intended recipients. If received in error, please delete them and
> > >>> > >> notify
> > >>> > >> the sender via reply e-mail immediately.*
> > >>> > >>
> > >>> > >> _______________________________________________
> > >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> > >>> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> > >>> > >> More help   : https://help.launchpad.net/ListHelp
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> _______________________________________________
> > >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> > >>> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> > >>> > >> More help   : https://help.launchpad.net/ListHelp
> > >>> > >>
> > >>> > >>
> > >>> > >
> > >>> > >
> > >>> > > --
> > >>> > > Alex Tumwesigye
> > >>> > >
> > >>> > > Technical Advisor - DHIS2 (Consultant),
> > >>> > > Ministry of Health/AFENET
> > >>> > > Kampala
> > >>> > > Uganda
> > >>> > >
> > >>> > > IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
> > >>> > >
> > >>> > > IT Specialist (Servers, Networks and Security, Health Information
> > >>> > > Systems
> > >>> > > - DHIS2 ) & Solar Consultant
> > >>> > >
> > >>> > > +256 774149 775, + 256 759 800161
> > >>> > >
> > >>> > > "I don't want to be anything other than what I have been - one tree
> > >>> > > hill "
> > >>> > >
> > >>> > > _______________________________________________
> > >>> > > Mailing list: https://launchpad.net/~dhis2-users
> > >>> > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > >>> > > Unsubscribe : https://launchpad.net/~dhis2-users
> > >>> > > More help   : https://help.launchpad.net/ListHelp
> > >>> > >
> > >>> > >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Jason P. Pickering
> > >>> > email: jason.p.pickering@xxxxxxxxx
> > >>> > tel:+46764147049
> > >>> > _______________________________________________
> > >>> > Mailing list: https://launchpad.net/~dhis2-users
> > >>> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > >>> > Unsubscribe : https://launchpad.net/~dhis2-users
> > >>> > More help   : https://help.launchpad.net/ListHelp
> > >>
> > >>
> > >> _______________________________________________
> > >> Mailing list: https://launchpad.net/~dhis2-users
> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> > >> More help   : https://help.launchpad.net/ListHelp
> > >>
> > >
> > >
> > >
> > > --
> > > Lars Helge Øverland
> > > Lead developer, DHIS 2
> > > University of Oslo
> > > Skype: larshelgeoverland
> > > http://www.dhis2.org
> > >
> > >
> > >
> > > _______________________________________________
> > > Mailing list: https://launchpad.net/~dhis2-users
> > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > Unsubscribe : https://launchpad.net/~dhis2-users
> > > More help   : https://help.launchpad.net/ListHelp
> > >
> >
> 
> 
> 
> -- 
> Lars Helge Øverland
> Lead developer, DHIS 2
> University of Oslo
> Skype: larshelgeoverland
> http://www.dhis2.org <https://www.dhis2.org>


Follow ups

References