← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis2-users] 25 hours in completing Analytic

 

Congratulation Neeraj and team ...it much appreciated

On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.hisp@xxxxxxxxx>
wrote:

> Dear Team, Thanks for all your suggestions.
>
> Now the time of analytic is reduced to 10 hours 41 minutes.
>
> We tried to VACUUM as Sam suggested but it didn't help then we upgraded
> postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some
> changes in configuration file of postgres and it reduced the time. But the
> database size is still same.
>
> Thanks for all your help!
>
> Thanks,
> Neeraj
>
> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
> wrote:
>
>> Neeraj,
>>
>> It's always an element of uncertainty linked to database sizes - ref
>> Sam's post over. So indicating the number of records you have in the
>> datavalue table & key meta-data tables would be useful + indicating whether
>> you are running other instances on the same server. Some comments - I've
>> been doing a lot of similar optimising work recently:
>>
>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 before
>> the worst bugs are fixed and it has stabilised).
>>
>> 2. Carefully check your postgres.conf against the recommended settings.
>> The guide is a bit superficial in the sense that it has recommended "fixed"
>> values only and no explanations around ranges below or above those, but you
>> can experiment a bit yourself (e.g. the recommended "max_connections = 200"
>> might not be sufficient for a really large system like what you have.
>>
>> 3. If your server is running that single instance only, then 48GB or RAM
>> should be sufficient. Our servers are all having 128GB RAM so we
>> experimented quite a bit earlier this year with giving a DHIS2 instance
>> large amounts or RAM (up to 60-70gb), with negligible impact on
>> performance. According to Lars, the DHIS2 cannot really utilize more than
>> around 16gb RAM (at least that is how I understood his communication at the
>> time). So 48GB should be sufficient for a single instance.
>>
>> 4. I've been doing performance optimizing recently on an instance with
>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
>> - 31,000 Orgunits
>> - 420 data elements
>> - 250 indicators
>> - around 100 mill datavalue records
>> - total size around 140gb with analytics tables.
>>
>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT
>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the
>> analytics engine is designed well, I would expect a nearly linear
>> relationship between database size and the time analytics takes to run. So
>> running analytics on your database on our server should in theory take 4-5
>> hours.
>>
>> We are obviously comparing oranges and nectarines here, in the sense that
>> there might be other aspects of our server and database that is different
>> from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your
>> instance have lots of tracker data, etc etc). I have not seen any
>> scientific/quantified comparative performance values related to specific
>> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD
>> improve analytics performance quite a bit - assuming around 30% then it
>> means running analytics on your database/server should take around 3
>> hours......
>>
>> I tried getting comparative, quantitative data on various configurations
>> of hardware and software (e.g. some users prefer CentOS, others Ubuntu)
>> during the academy in August, but did not get much - it seems most
>> users/providers have found a setup that works for them for now and nobody
>> is doing any systematic performance testing (some of the international
>> NGOs/companies using DHIS2 might have, but as with internally developed
>> apps they are not that keen on sharing). So it would be highly appreciated
>> if you would post the results on analytics time with every upgrade / tweak
>> you do - starting with the upgrade to Pg 9.5.4
>>
>> Best regards
>> Calle
>>
>> On 19 October 2016 at 13:28, Sam Johnson <samuel.johnson@xxxxxxxxxx>
>> wrote:
>>
>>> Hi Neeraj,
>>>
>>>
>>>
>>> *Using VACUUM and ANALYZE*
>>>
>>>
>>>
>>> Like Brajesh, my background is MySQL, and one database admin task that
>>> is often overlooked in MySQL is OPTIMIZE TABLEs.  This reclaims unused
>>> space (we’ve had 100Gb databases files drop to half their size) and
>>> refreshes index statistics (if the shape of your data has changed over
>>> time, this can make indices run faster).
>>>
>>>
>>>
>>> I’m new to PostgreSQL, but the core principles are the same, and a quick
>>> bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and
>>> ANALYZE commands.  If your database isn’t set to automatically do VACUUMs
>>> (the default DHIS2 postgres config doesn’t seem to be), you might want to
>>> try VACUUM *FULL*, which will literally rewrite all of your database
>>> tables and indices into smaller, more efficient files (note, however, that
>>> on a 500Gb database this could take a *looong* time – perhaps test on a
>>> backup first?).  The following forum post is a really nice, plain-English
>>> explanation of what VACUUM does:
>>>
>>> http://dba.stackexchange.com/questions/126258/what-is-table-
>>> bloating-in-databases
>>>
>>>
>>>
>>> As I mentioned, my background is MySQL rather than Postgres, so someone
>>> with more specific Postgres experience might like to also chime in here.
>>>
>>>
>>>
>>> Cheers, Sam.
>>>
>>>
>>>
>>>
>>>
>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson=
>>> qebo.co.uk@xxxxxxxxxxxxxxxxxxx> on behalf of Brajesh Murari <
>>> brajesh.murari@xxxxxxxxx>
>>> *Date: *Wednesday, 19 October 2016 at 08:28
>>> *To: *Knut Staring <knutst@xxxxxxxxx>
>>> *Cc: *DHIS 2 Users list <dhis2-users@xxxxxxxxxxxxxxxxxxx>, DHIS2
>>> Developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing
>>> Analytic
>>>
>>>
>>>
>>> Dear Neeraj,
>>>
>>>
>>>
>>> The physical database size doesn't matter much, even the number of
>>> records don't matter. In my experience the biggest problem that one can
>>> going to run in to is not size, but the number of queries you can handle at
>>> a time instance specially during analytic functionality execution. Most
>>> probably you should going to have to move to a master/slave configuration
>>> of your database, so that the read queries can run against the slaves and
>>> the write queries run against the master. However, if you and your database
>>> management team are not ready for this than, you can tweak your indexes for
>>> the queries you are running to speed up the response times. Also there is a
>>> lot of tweaking you can do to the network stack and kernel in Linux where
>>> MySQL Server has been installed that will help.Perhaps, I would focus first
>>> on your indexes, then have a server admin look at your OS, and if all that
>>> doesn't help it might be time to implement a master/slave configuration.
>>> The most important scalability factor is RAM. If the indexes of your tables
>>> fit into memory and your queries are highly optimized in analytic
>>> functionality, you can serve a reasonable amount of requests with a average
>>> machine. The number of records do matter, depending of how your tables look
>>> like. It's a difference to have a lot of varchar fields or only a couple of
>>> ints or longs. The physical size of the database matters as well,  think of
>>> backups, for instance. Depending on your engine, your physical db files on
>>> grow, but don't shrink, for instance with innodb. So deleting a lot of
>>> rows, doesn't help to shrink your physical files. Thus the database size
>>> does matter. If you have more than one table with more than a million
>>> records, then performance starts indeed to degrade. Indexig is one of the
>>> important stand need to take care, If you hit one million records you will
>>> get performance problems, if the indices are not set right (for example no
>>> indices for fields in "WHERE statements" or "ON conditions" in joins). If
>>> you hit 10 million records, you will start to get performance problems even
>>> if you have all your indices right. Hardware upgrades - adding more memory
>>> and more processor power, especially memory - often help to reduce the most
>>> severe problems by increasing the performance again, at least to a certain
>>> degree.
>>>
>>>
>>>
>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knutst@xxxxxxxxx> wrote:
>>>
>>> Just a heads-up that there seems to be a JDBC issue with Postgres 9.6,
>>> so perhaps you should try upgrading to 9.5 first.
>>>
>>>
>>>
>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <lars@xxxxxxxxx>
>>> wrote:
>>>
>>>
>>>
>>> Hi Neeraj,
>>>
>>>
>>>
>>> what usually helps to improve runtime is to improve/increase:
>>>
>>>
>>>
>>> - ssd (read and write speed)
>>>
>>> - number of CPUs
>>>
>>> - using latest postgresql (9.6 claims to have even better indexing
>>> performance
>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than 9.5)
>>>
>>> - tuning
>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning>
>>> of postgresql
>>>
>>>
>>>
>>>
>>>
>>> regards,
>>>
>>>
>>>
>>> Lars
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Lars Helge Øverland
>>>
>>> Lead developer, DHIS 2
>>>
>>> University of Oslo
>>>
>>> Skype: larshelgeoverland
>>>
>>> lars@xxxxxxxxx
>>>
>>> http://www.dhis2.org <https://www.dhis2.org/>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Knut Staring
>>>
>>> Dept. of Informatics, University of Oslo
>>>
>>> Norway: +4791880522
>>>
>>> Skype: knutstar
>>>
>>> http://dhis2.org
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Brajesh Murari,
>>>
>>> Postgraduate, Department of Computer Science and Engineering,
>>>
>>> Chaudhary Devi Lal University, Sirsa,
>>>
>>> India.
>>>
>>>
>>>
>>> The three basic dimensions of human development: a long and healthy
>>> life, access to knowledge, and a decent standard of living.
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>>
>> *******************************************
>>
>> Calle Hedberg
>>
>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>
>> Tel/fax (home): +27-21-685-6472
>>
>> Cell: +27-82-853-5352
>>
>> Iridium SatPhone: +8816-315-19119
>>
>> Email: calle.hedberg@xxxxxxxxx
>>
>> Skype: calle_hedberg
>>
>> *******************************************
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Thanks,
> Neeraj Gupta
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Best Regards,

Brajesh Murari,
Postgraduate, Dept of CSE,
Chaudhary Devi Lal University, Sirsa,
India.

The three basic dimensions of human development: a long and healthy life,
access to knowledge, and a decent standard of living.

Follow ups

References