← Back to team overview

dhis2-users team mailing list archive

Re: [Dhis2-devs] 25 hours in completing Analytic

 

Much appreciated Bob and Neeraj, much appreciated ...

On Mon, Oct 24, 2016 at 3:27 PM, Knut Staring <knutst@xxxxxxxxx> wrote:

> It would be very interesting to know of other similarly large
> installations. DATIM, Bangladesh, PSI perhaps - others?
>
> On Mon, Oct 24, 2016 at 10:39 AM, Neeraj Gupta <neeraj.hisp@xxxxxxxxx>
> wrote:
>
>> Calle,
>>
>> We have around 500 million record in database with 3666 data elements
>> having 26 category combinations and 201 indicators and there are 14398
>> organisation units.
>>
>> Thanks,
>> Neeraj
>>
>> On Mon, Oct 24, 2016 at 1:50 PM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
>> wrote:
>>
>>> Neeraj,
>>>
>>> You never stated the number of records you have in the datavalue table -
>>> what is it?
>>>
>>> In the same context: anybody have a rough idea of how many datavalue
>>> records there are in the global DATIM database - which I think currently
>>> might be the largest DHIS2 instance around?
>>>
>>> Given our own recent work on performance + what Neeraj has reported,
>>> I've been thinking of creating one test instance with let us say 500 mill
>>> datavalue records and another with let us say 1 billion, then use them to
>>> identify key bottlenecks in various processes AND use them to ensure that
>>> DHIS2 analytics performance is as linear as possible in terms of database
>>> size. Postgresql has introduced a number of new indexing algorithms in
>>> recent versions, and I'm not sure if DHIS2 is taking full advantage of them.
>>>
>>> Best regards
>>> Calle
>>>
>>> On 24 October 2016 at 07:53, Brajesh Murari <brajesh.murari@xxxxxxxxx>
>>> wrote:
>>>
>>>> Congratulation Neeraj and team ...it much appreciated
>>>>
>>>> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.hisp@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Dear Team, Thanks for all your suggestions.
>>>>>
>>>>> Now the time of analytic is reduced to 10 hours 41 minutes.
>>>>>
>>>>> We tried to VACUUM as Sam suggested but it didn't help then we
>>>>> upgraded postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made
>>>>> some changes in configuration file of postgres and it reduced the time. But
>>>>> the database size is still same.
>>>>>
>>>>> Thanks for all your help!
>>>>>
>>>>> Thanks,
>>>>> Neeraj
>>>>>
>>>>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <
>>>>> calle.hedberg@xxxxxxxxx> wrote:
>>>>>
>>>>>> Neeraj,
>>>>>>
>>>>>> It's always an element of uncertainty linked to database sizes - ref
>>>>>> Sam's post over. So indicating the number of records you have in the
>>>>>> datavalue table & key meta-data tables would be useful + indicating whether
>>>>>> you are running other instances on the same server. Some comments - I've
>>>>>> been doing a lot of similar optimising work recently:
>>>>>>
>>>>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6
>>>>>> before the worst bugs are fixed and it has stabilised).
>>>>>>
>>>>>> 2. Carefully check your postgres.conf against the recommended
>>>>>> settings. The guide is a bit superficial in the sense that it has
>>>>>> recommended "fixed" values only and no explanations around ranges below or
>>>>>> above those, but you can experiment a bit yourself (e.g. the recommended
>>>>>> "max_connections = 200" might not be sufficient for a really large system
>>>>>> like what you have.
>>>>>>
>>>>>> 3. If your server is running that single instance only, then 48GB or
>>>>>> RAM should be sufficient. Our servers are all having 128GB RAM so we
>>>>>> experimented quite a bit earlier this year with giving a DHIS2 instance
>>>>>> large amounts or RAM (up to 60-70gb), with negligible impact on
>>>>>> performance. According to Lars, the DHIS2 cannot really utilize more than
>>>>>> around 16gb RAM (at least that is how I understood his communication at the
>>>>>> time). So 48GB should be sufficient for a single instance.
>>>>>>
>>>>>> 4. I've been doing performance optimizing recently on an instance
>>>>>> with
>>>>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
>>>>>> - 31,000 Orgunits
>>>>>> - 420 data elements
>>>>>> - 250 indicators
>>>>>> - around 100 mill datavalue records
>>>>>> - total size around 140gb with analytics tables.
>>>>>>
>>>>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT
>>>>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the
>>>>>> analytics engine is designed well, I would expect a nearly linear
>>>>>> relationship between database size and the time analytics takes to run. So
>>>>>> running analytics on your database on our server should in theory take 4-5
>>>>>> hours.
>>>>>>
>>>>>> We are obviously comparing oranges and nectarines here, in the sense
>>>>>> that there might be other aspects of our server and database that is
>>>>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether
>>>>>> your instance have lots of tracker data, etc etc). I have not seen any
>>>>>> scientific/quantified comparative performance values related to specific
>>>>>> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD
>>>>>> improve analytics performance quite a bit - assuming around 30% then it
>>>>>> means running analytics on your database/server should take around 3
>>>>>> hours......
>>>>>>
>>>>>> I tried getting comparative, quantitative data on various
>>>>>> configurations of hardware and software (e.g. some users prefer CentOS,
>>>>>> others Ubuntu) during the academy in August, but did not get much - it
>>>>>> seems most users/providers have found a setup that works for them for now
>>>>>> and nobody is doing any systematic performance testing (some of the
>>>>>> international NGOs/companies using DHIS2 might have, but as with internally
>>>>>> developed apps they are not that keen on sharing). So it would be highly
>>>>>> appreciated if you would post the results on analytics time with every
>>>>>> upgrade / tweak you do - starting with the upgrade to Pg 9.5.4
>>>>>>
>>>>>> Best regards
>>>>>> Calle
>>>>>>
>>>>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.johnson@xxxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Neeraj,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Using VACUUM and ANALYZE*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Like Brajesh, my background is MySQL, and one database admin task
>>>>>>> that is often overlooked in MySQL is OPTIMIZE TABLEs.  This reclaims unused
>>>>>>> space (we’ve had 100Gb databases files drop to half their size) and
>>>>>>> refreshes index statistics (if the shape of your data has changed over
>>>>>>> time, this can make indices run faster).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I’m new to PostgreSQL, but the core principles are the same, and a
>>>>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the
>>>>>>> VACUUM and ANALYZE commands.  If your database isn’t set to automatically
>>>>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you
>>>>>>> might want to try VACUUM *FULL*, which will literally rewrite all
>>>>>>> of your database tables and indices into smaller, more efficient files
>>>>>>> (note, however, that on a 500Gb database this could take a *looong*
>>>>>>> time – perhaps test on a backup first?).  The following forum post is a
>>>>>>> really nice, plain-English explanation of what VACUUM does:
>>>>>>>
>>>>>>> http://dba.stackexchange.com/questions/126258/what-is-table-
>>>>>>> bloating-in-databases
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> As I mentioned, my background is MySQL rather than Postgres, so
>>>>>>> someone with more specific Postgres experience might like to also chime in
>>>>>>> here.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers, Sam.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson=
>>>>>>> qebo.co.uk@xxxxxxxxxxxxxxxxxxx> on behalf of Brajesh Murari <
>>>>>>> brajesh.murari@xxxxxxxxx>
>>>>>>> *Date: *Wednesday, 19 October 2016 at 08:28
>>>>>>> *To: *Knut Staring <knutst@xxxxxxxxx>
>>>>>>> *Cc: *DHIS 2 Users list <dhis2-users@xxxxxxxxxxxxxxxxxxx>, DHIS2
>>>>>>> Developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
>>>>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing
>>>>>>> Analytic
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dear Neeraj,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The physical database size doesn't matter much, even the number of
>>>>>>> records don't matter. In my experience the biggest problem that one can
>>>>>>> going to run in to is not size, but the number of queries you can handle at
>>>>>>> a time instance specially during analytic functionality execution. Most
>>>>>>> probably you should going to have to move to a master/slave configuration
>>>>>>> of your database, so that the read queries can run against the slaves and
>>>>>>> the write queries run against the master. However, if you and your database
>>>>>>> management team are not ready for this than, you can tweak your indexes for
>>>>>>> the queries you are running to speed up the response times. Also there is a
>>>>>>> lot of tweaking you can do to the network stack and kernel in Linux where
>>>>>>> MySQL Server has been installed that will help.Perhaps, I would focus first
>>>>>>> on your indexes, then have a server admin look at your OS, and if all that
>>>>>>> doesn't help it might be time to implement a master/slave configuration.
>>>>>>> The most important scalability factor is RAM. If the indexes of your tables
>>>>>>> fit into memory and your queries are highly optimized in analytic
>>>>>>> functionality, you can serve a reasonable amount of requests with a average
>>>>>>> machine. The number of records do matter, depending of how your tables look
>>>>>>> like. It's a difference to have a lot of varchar fields or only a couple of
>>>>>>> ints or longs. The physical size of the database matters as well,  think of
>>>>>>> backups, for instance. Depending on your engine, your physical db files on
>>>>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of
>>>>>>> rows, doesn't help to shrink your physical files. Thus the database size
>>>>>>> does matter. If you have more than one table with more than a million
>>>>>>> records, then performance starts indeed to degrade. Indexig is one of the
>>>>>>> important stand need to take care, If you hit one million records you will
>>>>>>> get performance problems, if the indices are not set right (for example no
>>>>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). If
>>>>>>> you hit 10 million records, you will start to get performance problems even
>>>>>>> if you have all your indices right. Hardware upgrades - adding more memory
>>>>>>> and more processor power, especially memory - often help to reduce the most
>>>>>>> severe problems by increasing the performance again, at least to a certain
>>>>>>> degree.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knutst@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Just a heads-up that there seems to be a JDBC issue with Postgres
>>>>>>> 9.6, so perhaps you should try upgrading to 9.5 first.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <lars@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Neeraj,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> what usually helps to improve runtime is to improve/increase:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - ssd (read and write speed)
>>>>>>>
>>>>>>> - number of CPUs
>>>>>>>
>>>>>>> - using latest postgresql (9.6 claims to have even better indexing
>>>>>>> performance
>>>>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than
>>>>>>> 9.5)
>>>>>>>
>>>>>>> - tuning
>>>>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning>
>>>>>>> of postgresql
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> regards,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Lars
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Lars Helge Øverland
>>>>>>>
>>>>>>> Lead developer, DHIS 2
>>>>>>>
>>>>>>> University of Oslo
>>>>>>>
>>>>>>> Skype: larshelgeoverland
>>>>>>>
>>>>>>> lars@xxxxxxxxx
>>>>>>>
>>>>>>> http://www.dhis2.org <https://www.dhis2.org/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Knut Staring
>>>>>>>
>>>>>>> Dept. of Informatics, University of Oslo
>>>>>>>
>>>>>>> Norway: +4791880522
>>>>>>>
>>>>>>> Skype: knutstar
>>>>>>>
>>>>>>> http://dhis2.org
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Brajesh Murari,
>>>>>>>
>>>>>>> Postgraduate, Department of Computer Science and Engineering,
>>>>>>>
>>>>>>> Chaudhary Devi Lal University, Sirsa,
>>>>>>>
>>>>>>> India.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The three basic dimensions of human development: a long and healthy
>>>>>>> life, access to knowledge, and a decent standard of living.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *******************************************
>>>>>>
>>>>>> Calle Hedberg
>>>>>>
>>>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>>>
>>>>>> Tel/fax (home): +27-21-685-6472
>>>>>>
>>>>>> Cell: +27-82-853-5352
>>>>>>
>>>>>> Iridium SatPhone: +8816-315-19119
>>>>>>
>>>>>> Email: calle.hedberg@xxxxxxxxx
>>>>>>
>>>>>> Skype: calle_hedberg
>>>>>>
>>>>>> *******************************************
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Neeraj Gupta
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Brajesh Murari,
>>>> Postgraduate, Dept of CSE,
>>>> Chaudhary Devi Lal University, Sirsa,
>>>> India.
>>>>
>>>> The three basic dimensions of human development: a long and healthy
>>>> life, access to knowledge, and a decent standard of living.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *******************************************
>>>
>>> Calle Hedberg
>>>
>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>
>>> Tel/fax (home): +27-21-685-6472
>>>
>>> Cell: +27-82-853-5352
>>>
>>> Iridium SatPhone: +8816-315-19119
>>>
>>> Email: calle.hedberg@xxxxxxxxx
>>>
>>> Skype: calle_hedberg
>>>
>>> *******************************************
>>>
>>>
>>
>>
>> --
>> Thanks,
>> Neeraj Gupta
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Knut Staring
> Dept. of Informatics, University of Oslo
> Norway: +4791880522
> Skype: knutstar
> http://dhis2.org
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Best Regards,

Brajesh Murari,
Postgraduate, Dept of CSE,
Chaudhary Devi Lal University, Sirsa,
India.

The three basic dimensions of human development: a long and healthy life,
access to knowledge, and a decent standard of living.

References