← Back to team overview

dhis2-users team mailing list archive

Re: Importing DHS survey data in DHIS

 

Hi Uwe,

make sure that you have tuned Postgres properly through postgresql.conf,
especially the last 5 settings are crucial for getting good write
performance.

http://dhis2.github.io/dhis2-docs/master/en/implementer/html/ch08s03.html#d5e464

checkpoint_segments = 32

PostgreSQL writes new transactions to a log file called WAL segments which
are 16MB in size. When a number of segments have been written a checkpoint
occurs. Setting this number to a larger value will thus improve performance
for write-heavy systems such as DHIS 2.

checkpoint_completion_target = 0.8

Determines the percentage of segment completion before a checkpoint occurs.
Setting this to a high value will thus spread the writes out and lower the
average write overhead.

wal_buffers = 16MB

Sets the memory used for buffering during the WAL write process. Increasing
this value might improve throughput in write-heavy systems.

synchronous_commit = off

Specifies whether transaction commits will wait for WAL records to be
written to the disk before returning to the client or not. Setting this to
off will improve performance considerably. It also implies that there is a
slight delay between the transaction is reported successful to the client
and it actually being safe, but the database state cannot be corrupted and
this is a good alternative for performance-intensive and write-heavy
systems like DHIS 2.

wal_writer_delay = 10000ms

Specifies the delay between WAL write operations. Setting this to a high
value will improve performance on write-heavy systems since potentially
many write operations can be executed within a single flush to disk.







On Tue, Feb 2, 2016 at 9:18 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:

> Lars is right that ADX won't be faster than dxf.  Both because it
> internally converts to dxf on import and because it abstracts away the
> categoryoptioncombo.  The first isn't really very costly but the other
> is.
>
> This means that that the two systems only have to match categories and
> categoryoptions which is a much easier mapping to maintain.
>
> But if you need raw speed it is going to be faster to produce dxf
> style categoryoptioncombos as that is closest to the way the data gets
> stored.  I am going to speed up the adx import code, but will still
> always be slower :-)
>
> On 2 February 2016 at 20:07, uwe wahser <uwe@xxxxxxxxx> wrote:
> > Ok, that sounds like ADX might even be a bit slower eventually, if the
> > transformation process outweighs a potentially reduced datavolume. I
> might
> > just stick with the json.
> >
> > @Jason: I also thought about SQL-Injection shortly, but I am fearing
> > internal changes of the data-model, which I'd have to understand fully in
> > the first place. Of course the api's also change more than I expected,
> but
> > at least that is announced :-D
> >
> > Uwe
> >
> > ---
> >
> >
> >
> > Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:
> >
> > Hi Uwe,
> >
> > ADX will not be faster than DXF, as for ADX, the stream is first
> converted
> > into DXF and then passed on to the regular importer.
> >
> > Lars
> >
> > On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering
> > <jason.p.pickering@xxxxxxxxx> wrote:
> >>
> >> This was a very trivial lab test,so not really conclusive at all.  I
> would
> >> just give it a try and see. If you see differences, please let the devs
> >> know.
> >>
> >> Given the scale of what you are attempting, have you considered using
> >> direct SQL injection? Not that I am recommending that route as there are
> >> many pitfalls, but it might be an option if implemented properly,
> especially
> >> considering your reported architecture.
> >>
> >> Regards
> >> Jason
> >>
> >>
> >> On Tue, Feb 2, 2016, 17:04 Uwe Wahser <uwe@xxxxxxxxx> wrote:
> >>>
> >>> Hi Jason,
> >>>
> >>> thanks for sharing the links. As I can see on a quick glance, you are
> >>> also
> >>> experimenting with the ADX-api - did you observe any significant
> >>> performance
> >>> differences between ADX and dataValueSets apis?
> >>>
> >>> Regards,
> >>>
> >>> Uwe
> >>>
> >>> > Jason Pickering <jason.p.pickering@xxxxxxxxx> hat am 2. Februar
> 2016 um
> >>> > 18:21
> >>> > geschrieben:
> >>> >
> >>> >
> >>> > Hi Olav,
> >>> > I have not worked with the DHS API per se, but have imported lots of
> >>> > data
> >>> > using the same approach which they outline here (
> >>> > http://api.dhsprogram.com/#/samples-r.cfm)
> >>> >
> >>> > I have written up a walkthrough of getting data out of one DHIS
> >>> > instance
> >>> > and into another one, and I think the basic principles would be the
> >>> > same (
> >>> > http://rpubs.com/jason_p_pickering/139589)
> >>> >
> >>> > Metadata needs to be mapped (or created), the data needs to be
> >>> > reshaped,
> >>> > and correctly formatted.
> >>> >
> >>> > It should not be too difficult. I used R, but there are other
> examples
> >>> > with
> >>> > Python and JavaScript on their examples page.
> >>> >
> >>> > Regards,
> >>> > Jason
> >>> >
> >>> >
> >>> > On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye <
> atumwesigye@xxxxxxxxx>
> >>> > wrote:
> >>> >
> >>> > > Dear Uwe,
> >>> > >
> >>> > > Have you tried to send data via the endpoint api/dataValueSets, it
> >>> > > may be
> >>> > > faster. Just stage your data and push it once.
> >>> > >
> >>> > >
> >>> > >
> http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
> >>> > >
> >>> > > Also to note, is how you send it, I have seen curl taking ages to
> >>> > > submit
> >>> > > individual values via the api. You need to send it as once file via
> >>> > > once
> >>> > > request or implement concurrency.
> >>> > >
> >>> > > Alex
> >>> > >
> >>> > > On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@xxxxxx>
> wrote:
> >>> > >
> >>> > >> Hi Randy and Uwe,
> >>> > >> thanks, interesting to hear you experiences. Uwe, what you are
> >>> > >> working on
> >>> > >> sounds quite a bit more complicated, and not least with far more
> >>> > >> data. I
> >>> > >> image that with household surveys, it would be a matter of < 100
> >>> > >> indicators
> >>> > >> for < 200 orgunits for 2-3 periods, i.e. a fraction of what you
> are
> >>> > >> dealing
> >>> > >> with!
> >>> > >>
> >>> > >> Olav
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@xxxxxxxxx>:
> >>> > >>
> >>> > >> Hi Olav & Randy,
> >>> > >>
> >>> > >> I am currently banging on kettle (aka Pentaho DI) to extract data
> >>> > >> from a
> >>> > >> source-system (SQL-ERP in our case) into DHIS2 dataSets in json
> >>> > >> format. In
> >>> > >> our current test-scenario (2 dataElements in a dataSet with a
> >>> > >> categoryCombination of 5 categories) we are currently updating
> ca. 4
> >>> > >> mio
> >>> > >> dataValues every night in a pseudo-delta mode (reading all data
> from
> >>> > >> source, comparing to what is there in DHIS2 already, then only
> >>> > >> pushing
> >>> > >> records for creating, updating or deleting dataValues into the
> api:
> >>> > >> ca.
> >>> > >> 150k per night in 1 hour, initial load was 7hrs). We still have to
> >>> > >> prove,
> >>> > >> that this is feasible when setting up the first real life dataSet
> >>> > >> where
> >>> > >> there will be more categories and more dataElements, thus
> exploding
> >>> > >> the
> >>> > >> number of dataValues.
> >>> > >>
> >>> > >> Getting there was a bit painful, but now it seems to work. I chose
> >>> > >> kettle
> >>> > >> instead of Talend ETL (both open source) as it seemed to be easier
> >>> > >> to get
> >>> > >> used to. However, from a data warehouse perspective I'd prefer to
> >>> > >> have
> >>> > >> DHIS2 offering some sort of an integrated ETL landscape on the
> long
> >>> > >> run,
> >>> > >> which would also allow to aggregate data from tracker into
> dataSets,
> >>> > >> tracker to tracker, dataSets to dataSets etc.
> >>> > >>
> >>> > >> Our current version of the kettle transformations and jobs were
> >>> > >> designed
> >>> > >> to be generic (not for a specific dataSet, but you have to design
> >>> > >> your own
> >>> > >> extractor which could be a simple csv-reader or maybe a DHS
> >>> > >> api-call). If
> >>> > >> you are interested, I will share them. Just be aware that they are
> >>> > >> currently in a very early and rough state and not documented.
> You'd
> >>> > >> have to
> >>> > >> bring along the willingness to dig yourself into kettle and be
> pain
> >>> > >> resistant to a certain degree :-)
> >>> > >>
> >>> > >> I'd be interested to hear from other experiences ...
> >>> > >>
> >>> > >> Have a nice sunday,
> >>> > >>
> >>> > >> Uwe
> >>> > >>
> >>> > >> ---
> >>> > >>
> >>> > >> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
> >>> > >>
> >>> > >> Not here unfortunately...just doing csv imports from DHS Excel
> >>> > >> files.
> >>> > >> Would be useful for our data warehouse.
> >>> > >> Randy
> >>> > >> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@xxxxxx> wrote:
> >>> > >>
> >>> > >>> Hi all,
> >>> > >>> I wanted to hear if anyone has any experience with the DHS API (
> >>> > >>> http://api.dhsprogram.com/#/index.html), and using it to import
> >>> > >>> survey
> >>> > >>> results into DHIS?
> >>> > >>>
> >>> > >>> Olav
> >>> > >>>
> >>> > >>> _______________________________________________
> >>> > >>> Mailing list: https://launchpad.net/~dhis2-users
> >>> > >>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >>> > >>> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > >>> More help   : https://help.launchpad.net/ListHelp
> >>> > >>>
> >>> > >>>
> >>> > >> *This message and its attachments are confidential and solely for
> >>> > >> the
> >>> > >> intended recipients. If received in error, please delete them and
> >>> > >> notify
> >>> > >> the sender via reply e-mail immediately.*
> >>> > >>
> >>> > >> _______________________________________________
> >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> >>> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > >> More help   : https://help.launchpad.net/ListHelp
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> _______________________________________________
> >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> >>> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > >> More help   : https://help.launchpad.net/ListHelp
> >>> > >>
> >>> > >>
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Alex Tumwesigye
> >>> > >
> >>> > > Technical Advisor - DHIS2 (Consultant),
> >>> > > Ministry of Health/AFENET
> >>> > > Kampala
> >>> > > Uganda
> >>> > >
> >>> > > IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
> >>> > >
> >>> > > IT Specialist (Servers, Networks and Security, Health Information
> >>> > > Systems
> >>> > > - DHIS2 ) & Solar Consultant
> >>> > >
> >>> > > +256 774149 775, + 256 759 800161
> >>> > >
> >>> > > "I don't want to be anything other than what I have been - one tree
> >>> > > hill "
> >>> > >
> >>> > > _______________________________________________
> >>> > > Mailing list: https://launchpad.net/~dhis2-users
> >>> > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >>> > > Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > > More help   : https://help.launchpad.net/ListHelp
> >>> > >
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> > Jason P. Pickering
> >>> > email: jason.p.pickering@xxxxxxxxx
> >>> > tel:+46764147049
> >>> > _______________________________________________
> >>> > Mailing list: https://launchpad.net/~dhis2-users
> >>> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >>> > Unsubscribe : https://launchpad.net/~dhis2-users
> >>> > More help   : https://help.launchpad.net/ListHelp
> >>
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-users
> >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dhis2-users
> >> More help   : https://help.launchpad.net/ListHelp
> >>
> >
> >
> >
> > --
> > Lars Helge Øverland
> > Lead developer, DHIS 2
> > University of Oslo
> > Skype: larshelgeoverland
> > http://www.dhis2.org
> >
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-users
> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~dhis2-users
> > More help   : https://help.launchpad.net/ListHelp
> >
>



-- 
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org <https://www.dhis2.org>

Follow ups

References