← Back to team overview

dhis2-users team mailing list archive

Re: Importing DHS survey data in DHIS

 

Sure. I usually get 2-5000 values per second on import so there should be
room for some improvement.

regards,

Lars


On Wed, Feb 3, 2016 at 8:52 AM, Uwe Wahser <uwe@xxxxxxxxx> wrote:

> Hi Lars,
>
> thanks for the hint. Currently I am just running on standards. I'll do a
> bit of
> monitoring first before adjusting those values. I'll post the difference.
>
> Regards,
>
> Uwe
>
> ---
> > Lars Helge Øverland <larshelge@xxxxxxxxx> hat am 2. Februar 2016 um
> 23:43
> > geschrieben:
> >
> >
> > Hi Uwe,
> >
> > make sure that you have tuned Postgres properly through postgresql.conf,
> > especially the last 5 settings are crucial for getting good write
> > performance.
> >
> >
> http://dhis2.github.io/dhis2-docs/master/en/implementer/html/ch08s03.html#d5e464
> >
> > checkpoint_segments = 32
> >
> > PostgreSQL writes new transactions to a log file called WAL segments
> which
> > are 16MB in size. When a number of segments have been written a
> checkpoint
> > occurs. Setting this number to a larger value will thus improve
> performance
> > for write-heavy systems such as DHIS 2.
> >
> > checkpoint_completion_target = 0.8
> >
> > Determines the percentage of segment completion before a checkpoint
> occurs.
> > Setting this to a high value will thus spread the writes out and lower
> the
> > average write overhead.
> >
> > wal_buffers = 16MB
> >
> > Sets the memory used for buffering during the WAL write process.
> Increasing
> > this value might improve throughput in write-heavy systems.
> >
> > synchronous_commit = off
> >
> > Specifies whether transaction commits will wait for WAL records to be
> > written to the disk before returning to the client or not. Setting this
> to
> > off will improve performance considerably. It also implies that there is
> a
> > slight delay between the transaction is reported successful to the client
> > and it actually being safe, but the database state cannot be corrupted
> and
> > this is a good alternative for performance-intensive and write-heavy
> > systems like DHIS 2.
> >
> > wal_writer_delay = 10000ms
> >
> > Specifies the delay between WAL write operations. Setting this to a high
> > value will improve performance on write-heavy systems since potentially
> > many write operations can be executed within a single flush to disk.
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Feb 2, 2016 at 9:18 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
> >
> > > Lars is right that ADX won't be faster than dxf.  Both because it
> > > internally converts to dxf on import and because it abstracts away the
> > > categoryoptioncombo.  The first isn't really very costly but the other
> > > is.
> > >
> > > This means that that the two systems only have to match categories and
> > > categoryoptions which is a much easier mapping to maintain.
> > >
> > > But if you need raw speed it is going to be faster to produce dxf
> > > style categoryoptioncombos as that is closest to the way the data gets
> > > stored.  I am going to speed up the adx import code, but will still
> > > always be slower :-)
> > >
> > > On 2 February 2016 at 20:07, uwe wahser <uwe@xxxxxxxxx> wrote:
> > > > Ok, that sounds like ADX might even be a bit slower eventually, if
> the
> > > > transformation process outweighs a potentially reduced datavolume. I
> > > might
> > > > just stick with the json.
> > > >
> > > > @Jason: I also thought about SQL-Injection shortly, but I am fearing
> > > > internal changes of the data-model, which I'd have to understand
> fully in
> > > > the first place. Of course the api's also change more than I
> expected,
> > > but
> > > > at least that is announced :-D
> > > >
> > > > Uwe
> > > >
> > > > ---
> > > >
> > > >
> > > >
> > > > Am 02.02.2016 um 19:49 schrieb Lars Helge Øverland:
> > > >
> > > > Hi Uwe,
> > > >
> > > > ADX will not be faster than DXF, as for ADX, the stream is first
> > > converted
> > > > into DXF and then passed on to the regular importer.
> > > >
> > > > Lars
> > > >
> > > > On Tue, Feb 2, 2016 at 5:33 PM, Jason Pickering
> > > > <jason.p.pickering@xxxxxxxxx> wrote:
> > > >>
> > > >> This was a very trivial lab test,so not really conclusive at all.  I
> > > would
> > > >> just give it a try and see. If you see differences, please let the
> devs
> > > >> know.
> > > >>
> > > >> Given the scale of what you are attempting, have you considered
> using
> > > >> direct SQL injection? Not that I am recommending that route as
> there are
> > > >> many pitfalls, but it might be an option if implemented properly,
> > > especially
> > > >> considering your reported architecture.
> > > >>
> > > >> Regards
> > > >> Jason
> > > >>
> > > >>
> > > >> On Tue, Feb 2, 2016, 17:04 Uwe Wahser <uwe@xxxxxxxxx> wrote:
> > > >>>
> > > >>> Hi Jason,
> > > >>>
> > > >>> thanks for sharing the links. As I can see on a quick glance, you
> are
> > > >>> also
> > > >>> experimenting with the ADX-api - did you observe any significant
> > > >>> performance
> > > >>> differences between ADX and dataValueSets apis?
> > > >>>
> > > >>> Regards,
> > > >>>
> > > >>> Uwe
> > > >>>
> > > >>> > Jason Pickering <jason.p.pickering@xxxxxxxxx> hat am 2. Februar
> > > 2016 um
> > > >>> > 18:21
> > > >>> > geschrieben:
> > > >>> >
> > > >>> >
> > > >>> > Hi Olav,
> > > >>> > I have not worked with the DHS API per se, but have imported
> lots of
> > > >>> > data
> > > >>> > using the same approach which they outline here (
> > > >>> > http://api.dhsprogram.com/#/samples-r.cfm)
> > > >>> >
> > > >>> > I have written up a walkthrough of getting data out of one DHIS
> > > >>> > instance
> > > >>> > and into another one, and I think the basic principles would be
> the
> > > >>> > same (
> > > >>> > http://rpubs.com/jason_p_pickering/139589)
> > > >>> >
> > > >>> > Metadata needs to be mapped (or created), the data needs to be
> > > >>> > reshaped,
> > > >>> > and correctly formatted.
> > > >>> >
> > > >>> > It should not be too difficult. I used R, but there are other
> > > examples
> > > >>> > with
> > > >>> > Python and JavaScript on their examples page.
> > > >>> >
> > > >>> > Regards,
> > > >>> > Jason
> > > >>> >
> > > >>> >
> > > >>> > On Tue, Feb 2, 2016 at 3:31 PM, Alex Tumwesigye <
> > > atumwesigye@xxxxxxxxx>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Dear Uwe,
> > > >>> > >
> > > >>> > > Have you tried to send data via the endpoint
> api/dataValueSets, it
> > > >>> > > may be
> > > >>> > > faster. Just stage your data and push it once.
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > >
> http://dhis2.github.io/dhis2-docs/master/en/developer/html/ch01s13.html#d5e1372
> > > >>> > >
> > > >>> > > Also to note, is how you send it, I have seen curl taking ages
> to
> > > >>> > > submit
> > > >>> > > individual values via the api. You need to send it as once
> file via
> > > >>> > > once
> > > >>> > > request or implement concurrency.
> > > >>> > >
> > > >>> > > Alex
> > > >>> > >
> > > >>> > > On Tue, Feb 2, 2016 at 5:13 PM, Olav Poppe <olav.poppe@xxxxxx>
> > > wrote:
> > > >>> > >
> > > >>> > >> Hi Randy and Uwe,
> > > >>> > >> thanks, interesting to hear you experiences. Uwe, what you are
> > > >>> > >> working on
> > > >>> > >> sounds quite a bit more complicated, and not least with far
> more
> > > >>> > >> data. I
> > > >>> > >> image that with household surveys, it would be a matter of <
> 100
> > > >>> > >> indicators
> > > >>> > >> for < 200 orgunits for 2-3 periods, i.e. a fraction of what
> you
> > > are
> > > >>> > >> dealing
> > > >>> > >> with!
> > > >>> > >>
> > > >>> > >> Olav
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> 31. jan. 2016 kl. 09.29 skrev uwe wahser <uwe@xxxxxxxxx>:
> > > >>> > >>
> > > >>> > >> Hi Olav & Randy,
> > > >>> > >>
> > > >>> > >> I am currently banging on kettle (aka Pentaho DI) to extract
> data
> > > >>> > >> from a
> > > >>> > >> source-system (SQL-ERP in our case) into DHIS2 dataSets in
> json
> > > >>> > >> format. In
> > > >>> > >> our current test-scenario (2 dataElements in a dataSet with a
> > > >>> > >> categoryCombination of 5 categories) we are currently updating
> > > ca. 4
> > > >>> > >> mio
> > > >>> > >> dataValues every night in a pseudo-delta mode (reading all
> data
> > > from
> > > >>> > >> source, comparing to what is there in DHIS2 already, then only
> > > >>> > >> pushing
> > > >>> > >> records for creating, updating or deleting dataValues into the
> > > api:
> > > >>> > >> ca.
> > > >>> > >> 150k per night in 1 hour, initial load was 7hrs). We still
> have to
> > > >>> > >> prove,
> > > >>> > >> that this is feasible when setting up the first real life
> dataSet
> > > >>> > >> where
> > > >>> > >> there will be more categories and more dataElements, thus
> > > exploding
> > > >>> > >> the
> > > >>> > >> number of dataValues.
> > > >>> > >>
> > > >>> > >> Getting there was a bit painful, but now it seems to work. I
> chose
> > > >>> > >> kettle
> > > >>> > >> instead of Talend ETL (both open source) as it seemed to be
> easier
> > > >>> > >> to get
> > > >>> > >> used to. However, from a data warehouse perspective I'd
> prefer to
> > > >>> > >> have
> > > >>> > >> DHIS2 offering some sort of an integrated ETL landscape on the
> > > long
> > > >>> > >> run,
> > > >>> > >> which would also allow to aggregate data from tracker into
> > > dataSets,
> > > >>> > >> tracker to tracker, dataSets to dataSets etc.
> > > >>> > >>
> > > >>> > >> Our current version of the kettle transformations and jobs
> were
> > > >>> > >> designed
> > > >>> > >> to be generic (not for a specific dataSet, but you have to
> design
> > > >>> > >> your own
> > > >>> > >> extractor which could be a simple csv-reader or maybe a DHS
> > > >>> > >> api-call). If
> > > >>> > >> you are interested, I will share them. Just be aware that
> they are
> > > >>> > >> currently in a very early and rough state and not documented.
> > > You'd
> > > >>> > >> have to
> > > >>> > >> bring along the willingness to dig yourself into kettle and be
> > > pain
> > > >>> > >> resistant to a certain degree :-)
> > > >>> > >>
> > > >>> > >> I'd be interested to hear from other experiences ...
> > > >>> > >>
> > > >>> > >> Have a nice sunday,
> > > >>> > >>
> > > >>> > >> Uwe
> > > >>> > >>
> > > >>> > >> ---
> > > >>> > >>
> > > >>> > >> Am 29.01.2016 um 17:31 schrieb Wilson, Randy:
> > > >>> > >>
> > > >>> > >> Not here unfortunately...just doing csv imports from DHS Excel
> > > >>> > >> files.
> > > >>> > >> Would be useful for our data warehouse.
> > > >>> > >> Randy
> > > >>> > >> On Jan 29, 2016 2:59 PM, "Olav Poppe" <olav.poppe@xxxxxx>
> wrote:
> > > >>> > >>
> > > >>> > >>> Hi all,
> > > >>> > >>> I wanted to hear if anyone has any experience with the DHS
> API (
> > > >>> > >>> http://api.dhsprogram.com/#/index.html), and using it to
> import
> > > >>> > >>> survey
> > > >>> > >>> results into DHIS?
> > > >>> > >>>
> > > >>> > >>> Olav
> > > >>> > >>>
> > > >>> > >>> _______________________________________________
> > > >>> > >>> Mailing list: https://launchpad.net/~dhis2-users
> > > >>> > >>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > >>> > >>> Unsubscribe : https://launchpad.net/~dhis2-users
> > > >>> > >>> More help   : https://help.launchpad.net/ListHelp
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >> *This message and its attachments are confidential and solely
> for
> > > >>> > >> the
> > > >>> > >> intended recipients. If received in error, please delete them
> and
> > > >>> > >> notify
> > > >>> > >> the sender via reply e-mail immediately.*
> > > >>> > >>
> > > >>> > >> _______________________________________________
> > > >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> > > >>> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> > > >>> > >> More help   : https://help.launchpad.net/ListHelp
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> _______________________________________________
> > > >>> > >> Mailing list: https://launchpad.net/~dhis2-users
> > > >>> > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > >>> > >> Unsubscribe : https://launchpad.net/~dhis2-users
> > > >>> > >> More help   : https://help.launchpad.net/ListHelp
> > > >>> > >>
> > > >>> > >>
> > > >>> > >
> > > >>> > >
> > > >>> > > --
> > > >>> > > Alex Tumwesigye
> > > >>> > >
> > > >>> > > Technical Advisor - DHIS2 (Consultant),
> > > >>> > > Ministry of Health/AFENET
> > > >>> > > Kampala
> > > >>> > > Uganda
> > > >>> > >
> > > >>> > > IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
> > > >>> > >
> > > >>> > > IT Specialist (Servers, Networks and Security, Health
> Information
> > > >>> > > Systems
> > > >>> > > - DHIS2 ) & Solar Consultant
> > > >>> > >
> > > >>> > > +256 774149 775, + 256 759 800161
> > > >>> > >
> > > >>> > > "I don't want to be anything other than what I have been - one
> tree
> > > >>> > > hill "
> > > >>> > >
> > > >>> > > _______________________________________________
> > > >>> > > Mailing list: https://launchpad.net/~dhis2-users
> > > >>> > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > >>> > > Unsubscribe : https://launchpad.net/~dhis2-users
> > > >>> > > More help   : https://help.launchpad.net/ListHelp
> > > >>> > >
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> > Jason P. Pickering
> > > >>> > email: jason.p.pickering@xxxxxxxxx
> > > >>> > tel:+46764147049
> > > >>> > _______________________________________________
> > > >>> > Mailing list: https://launchpad.net/~dhis2-users
> > > >>> > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > >>> > Unsubscribe : https://launchpad.net/~dhis2-users
> > > >>> > More help   : https://help.launchpad.net/ListHelp
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Mailing list: https://launchpad.net/~dhis2-users
> > > >> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > >> Unsubscribe : https://launchpad.net/~dhis2-users
> > > >> More help   : https://help.launchpad.net/ListHelp
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Lars Helge Øverland
> > > > Lead developer, DHIS 2
> > > > University of Oslo
> > > > Skype: larshelgeoverland
> > > > http://www.dhis2.org
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Mailing list: https://launchpad.net/~dhis2-users
> > > > Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> > > > Unsubscribe : https://launchpad.net/~dhis2-users
> > > > More help   : https://help.launchpad.net/ListHelp
> > > >
> > >
> >
> >
> >
> > --
> > Lars Helge Øverland
> > Lead developer, DHIS 2
> > University of Oslo
> > Skype: larshelgeoverland
> > http://www.dhis2.org <https://www.dhis2.org>
>



-- 
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
http://www.dhis2.org <https://www.dhis2.org>

References