← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis2-users] 25 hours in completing Analytic

 

Neeraj,

You never stated the number of records you have in the datavalue table -
what is it?

In the same context: anybody have a rough idea of how many datavalue
records there are in the global DATIM database - which I think currently
might be the largest DHIS2 instance around?

Given our own recent work on performance + what Neeraj has reported, I've
been thinking of creating one test instance with let us say 500 mill
datavalue records and another with let us say 1 billion, then use them to
identify key bottlenecks in various processes AND use them to ensure that
DHIS2 analytics performance is as linear as possible in terms of database
size. Postgresql has introduced a number of new indexing algorithms in
recent versions, and I'm not sure if DHIS2 is taking full advantage of them.

Best regards
Calle

On 24 October 2016 at 07:53, Brajesh Murari <brajesh.murari@xxxxxxxxx>
wrote:

> Congratulation Neeraj and team ...it much appreciated
>
> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.hisp@xxxxxxxxx>
> wrote:
>
>> Dear Team, Thanks for all your suggestions.
>>
>> Now the time of analytic is reduced to 10 hours 41 minutes.
>>
>> We tried to VACUUM as Sam suggested but it didn't help then we upgraded
>> postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some
>> changes in configuration file of postgres and it reduced the time. But the
>> database size is still same.
>>
>> Thanks for all your help!
>>
>> Thanks,
>> Neeraj
>>
>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
>> wrote:
>>
>>> Neeraj,
>>>
>>> It's always an element of uncertainty linked to database sizes - ref
>>> Sam's post over. So indicating the number of records you have in the
>>> datavalue table & key meta-data tables would be useful + indicating whether
>>> you are running other instances on the same server. Some comments - I've
>>> been doing a lot of similar optimising work recently:
>>>
>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 before
>>> the worst bugs are fixed and it has stabilised).
>>>
>>> 2. Carefully check your postgres.conf against the recommended settings.
>>> The guide is a bit superficial in the sense that it has recommended "fixed"
>>> values only and no explanations around ranges below or above those, but you
>>> can experiment a bit yourself (e.g. the recommended "max_connections = 200"
>>> might not be sufficient for a really large system like what you have.
>>>
>>> 3. If your server is running that single instance only, then 48GB or RAM
>>> should be sufficient. Our servers are all having 128GB RAM so we
>>> experimented quite a bit earlier this year with giving a DHIS2 instance
>>> large amounts or RAM (up to 60-70gb), with negligible impact on
>>> performance. According to Lars, the DHIS2 cannot really utilize more than
>>> around 16gb RAM (at least that is how I understood his communication at the
>>> time). So 48GB should be sufficient for a single instance.
>>>
>>> 4. I've been doing performance optimizing recently on an instance with
>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
>>> - 31,000 Orgunits
>>> - 420 data elements
>>> - 250 indicators
>>> - around 100 mill datavalue records
>>> - total size around 140gb with analytics tables.
>>>
>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT
>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the
>>> analytics engine is designed well, I would expect a nearly linear
>>> relationship between database size and the time analytics takes to run. So
>>> running analytics on your database on our server should in theory take 4-5
>>> hours.
>>>
>>> We are obviously comparing oranges and nectarines here, in the sense
>>> that there might be other aspects of our server and database that is
>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether
>>> your instance have lots of tracker data, etc etc). I have not seen any
>>> scientific/quantified comparative performance values related to specific
>>> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD
>>> improve analytics performance quite a bit - assuming around 30% then it
>>> means running analytics on your database/server should take around 3
>>> hours......
>>>
>>> I tried getting comparative, quantitative data on various configurations
>>> of hardware and software (e.g. some users prefer CentOS, others Ubuntu)
>>> during the academy in August, but did not get much - it seems most
>>> users/providers have found a setup that works for them for now and nobody
>>> is doing any systematic performance testing (some of the international
>>> NGOs/companies using DHIS2 might have, but as with internally developed
>>> apps they are not that keen on sharing). So it would be highly appreciated
>>> if you would post the results on analytics time with every upgrade / tweak
>>> you do - starting with the upgrade to Pg 9.5.4
>>>
>>> Best regards
>>> Calle
>>>
>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.johnson@xxxxxxxxxx>
>>> wrote:
>>>
>>>> Hi Neeraj,
>>>>
>>>>
>>>>
>>>> *Using VACUUM and ANALYZE*
>>>>
>>>>
>>>>
>>>> Like Brajesh, my background is MySQL, and one database admin task that
>>>> is often overlooked in MySQL is OPTIMIZE TABLEs.  This reclaims unused
>>>> space (we’ve had 100Gb databases files drop to half their size) and
>>>> refreshes index statistics (if the shape of your data has changed over
>>>> time, this can make indices run faster).
>>>>
>>>>
>>>>
>>>> I’m new to PostgreSQL, but the core principles are the same, and a
>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the
>>>> VACUUM and ANALYZE commands.  If your database isn’t set to automatically
>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you
>>>> might want to try VACUUM *FULL*, which will literally rewrite all of
>>>> your database tables and indices into smaller, more efficient files (note,
>>>> however, that on a 500Gb database this could take a *looong* time –
>>>> perhaps test on a backup first?).  The following forum post is a really
>>>> nice, plain-English explanation of what VACUUM does:
>>>>
>>>> http://dba.stackexchange.com/questions/126258/what-is-table-
>>>> bloating-in-databases
>>>>
>>>>
>>>>
>>>> As I mentioned, my background is MySQL rather than Postgres, so someone
>>>> with more specific Postgres experience might like to also chime in here.
>>>>
>>>>
>>>>
>>>> Cheers, Sam.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson=
>>>> qebo.co.uk@xxxxxxxxxxxxxxxxxxx> on behalf of Brajesh Murari <
>>>> brajesh.murari@xxxxxxxxx>
>>>> *Date: *Wednesday, 19 October 2016 at 08:28
>>>> *To: *Knut Staring <knutst@xxxxxxxxx>
>>>> *Cc: *DHIS 2 Users list <dhis2-users@xxxxxxxxxxxxxxxxxxx>, DHIS2
>>>> Developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing
>>>> Analytic
>>>>
>>>>
>>>>
>>>> Dear Neeraj,
>>>>
>>>>
>>>>
>>>> The physical database size doesn't matter much, even the number of
>>>> records don't matter. In my experience the biggest problem that one can
>>>> going to run in to is not size, but the number of queries you can handle at
>>>> a time instance specially during analytic functionality execution. Most
>>>> probably you should going to have to move to a master/slave configuration
>>>> of your database, so that the read queries can run against the slaves and
>>>> the write queries run against the master. However, if you and your database
>>>> management team are not ready for this than, you can tweak your indexes for
>>>> the queries you are running to speed up the response times. Also there is a
>>>> lot of tweaking you can do to the network stack and kernel in Linux where
>>>> MySQL Server has been installed that will help.Perhaps, I would focus first
>>>> on your indexes, then have a server admin look at your OS, and if all that
>>>> doesn't help it might be time to implement a master/slave configuration.
>>>> The most important scalability factor is RAM. If the indexes of your tables
>>>> fit into memory and your queries are highly optimized in analytic
>>>> functionality, you can serve a reasonable amount of requests with a average
>>>> machine. The number of records do matter, depending of how your tables look
>>>> like. It's a difference to have a lot of varchar fields or only a couple of
>>>> ints or longs. The physical size of the database matters as well,  think of
>>>> backups, for instance. Depending on your engine, your physical db files on
>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of
>>>> rows, doesn't help to shrink your physical files. Thus the database size
>>>> does matter. If you have more than one table with more than a million
>>>> records, then performance starts indeed to degrade. Indexig is one of the
>>>> important stand need to take care, If you hit one million records you will
>>>> get performance problems, if the indices are not set right (for example no
>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). If
>>>> you hit 10 million records, you will start to get performance problems even
>>>> if you have all your indices right. Hardware upgrades - adding more memory
>>>> and more processor power, especially memory - often help to reduce the most
>>>> severe problems by increasing the performance again, at least to a certain
>>>> degree.
>>>>
>>>>
>>>>
>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knutst@xxxxxxxxx>
>>>> wrote:
>>>>
>>>> Just a heads-up that there seems to be a JDBC issue with Postgres 9.6,
>>>> so perhaps you should try upgrading to 9.5 first.
>>>>
>>>>
>>>>
>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <lars@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> Hi Neeraj,
>>>>
>>>>
>>>>
>>>> what usually helps to improve runtime is to improve/increase:
>>>>
>>>>
>>>>
>>>> - ssd (read and write speed)
>>>>
>>>> - number of CPUs
>>>>
>>>> - using latest postgresql (9.6 claims to have even better indexing
>>>> performance
>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than 9.5)
>>>>
>>>> - tuning
>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning>
>>>> of postgresql
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>>
>>>> Lars
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Lars Helge Øverland
>>>>
>>>> Lead developer, DHIS 2
>>>>
>>>> University of Oslo
>>>>
>>>> Skype: larshelgeoverland
>>>>
>>>> lars@xxxxxxxxx
>>>>
>>>> http://www.dhis2.org <https://www.dhis2.org/>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Knut Staring
>>>>
>>>> Dept. of Informatics, University of Oslo
>>>>
>>>> Norway: +4791880522
>>>>
>>>> Skype: knutstar
>>>>
>>>> http://dhis2.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>>
>>>>
>>>>
>>>> Brajesh Murari,
>>>>
>>>> Postgraduate, Department of Computer Science and Engineering,
>>>>
>>>> Chaudhary Devi Lal University, Sirsa,
>>>>
>>>> India.
>>>>
>>>>
>>>>
>>>> The three basic dimensions of human development: a long and healthy
>>>> life, access to knowledge, and a decent standard of living.
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *******************************************
>>>
>>> Calle Hedberg
>>>
>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>
>>> Tel/fax (home): +27-21-685-6472
>>>
>>> Cell: +27-82-853-5352
>>>
>>> Iridium SatPhone: +8816-315-19119
>>>
>>> Email: calle.hedberg@xxxxxxxxx
>>>
>>> Skype: calle_hedberg
>>>
>>> *******************************************
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Thanks,
>> Neeraj Gupta
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Best Regards,
>
> Brajesh Murari,
> Postgraduate, Dept of CSE,
> Chaudhary Devi Lal University, Sirsa,
> India.
>
> The three basic dimensions of human development: a long and healthy life,
> access to knowledge, and a decent standard of living.
>



-- 

*******************************************

Calle Hedberg

46D Alma Road, 7700 Rosebank, SOUTH AFRICA

Tel/fax (home): +27-21-685-6472

Cell: +27-82-853-5352

Iridium SatPhone: +8816-315-19119

Email: calle.hedberg@xxxxxxxxx

Skype: calle_hedberg

*******************************************

Follow ups

References