← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis2-users] 25 hours in completing Analytic

 

Thank you everyone. I will give try to each options you gave and post the
result on the mail.

Thanks,
Neeraj

On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
wrote:

> Neeraj,
>
> It's always an element of uncertainty linked to database sizes - ref Sam's
> post over. So indicating the number of records you have in the datavalue
> table & key meta-data tables would be useful + indicating whether you are
> running other instances on the same server. Some comments - I've been doing
> a lot of similar optimising work recently:
>
> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 before
> the worst bugs are fixed and it has stabilised).
>
> 2. Carefully check your postgres.conf against the recommended settings.
> The guide is a bit superficial in the sense that it has recommended "fixed"
> values only and no explanations around ranges below or above those, but you
> can experiment a bit yourself (e.g. the recommended "max_connections = 200"
> might not be sufficient for a really large system like what you have.
>
> 3. If your server is running that single instance only, then 48GB or RAM
> should be sufficient. Our servers are all having 128GB RAM so we
> experimented quite a bit earlier this year with giving a DHIS2 instance
> large amounts or RAM (up to 60-70gb), with negligible impact on
> performance. According to Lars, the DHIS2 cannot really utilize more than
> around 16gb RAM (at least that is how I understood his communication at the
> time). So 48GB should be sufficient for a single instance.
>
> 4. I've been doing performance optimizing recently on an instance with
> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
> - 31,000 Orgunits
> - 420 data elements
> - 250 indicators
> - around 100 mill datavalue records
> - total size around 140gb with analytics tables.
>
> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT
> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the
> analytics engine is designed well, I would expect a nearly linear
> relationship between database size and the time analytics takes to run. So
> running analytics on your database on our server should in theory take 4-5
> hours.
>
> We are obviously comparing oranges and nectarines here, in the sense that
> there might be other aspects of our server and database that is different
> from yours (type of CPU, no of OUs, no of DEs/Indicators, whether your
> instance have lots of tracker data, etc etc). I have not seen any
> scientific/quantified comparative performance values related to specific
> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD
> improve analytics performance quite a bit - assuming around 30% then it
> means running analytics on your database/server should take around 3
> hours......
>
> I tried getting comparative, quantitative data on various configurations
> of hardware and software (e.g. some users prefer CentOS, others Ubuntu)
> during the academy in August, but did not get much - it seems most
> users/providers have found a setup that works for them for now and nobody
> is doing any systematic performance testing (some of the international
> NGOs/companies using DHIS2 might have, but as with internally developed
> apps they are not that keen on sharing). So it would be highly appreciated
> if you would post the results on analytics time with every upgrade / tweak
> you do - starting with the upgrade to Pg 9.5.4
>
> Best regards
> Calle
>
> On 19 October 2016 at 13:28, Sam Johnson <samuel.johnson@xxxxxxxxxx>
> wrote:
>
>> Hi Neeraj,
>>
>>
>>
>> *Using VACUUM and ANALYZE*
>>
>>
>>
>> Like Brajesh, my background is MySQL, and one database admin task that is
>> often overlooked in MySQL is OPTIMIZE TABLEs.  This reclaims unused space
>> (we’ve had 100Gb databases files drop to half their size) and refreshes
>> index statistics (if the shape of your data has changed over time, this can
>> make indices run faster).
>>
>>
>>
>> I’m new to PostgreSQL, but the core principles are the same, and a quick
>> bit of Googling shows that the equivalents in PostgreSQL are the VACUUM and
>> ANALYZE commands.  If your database isn’t set to automatically do VACUUMs
>> (the default DHIS2 postgres config doesn’t seem to be), you might want to
>> try VACUUM *FULL*, which will literally rewrite all of your database
>> tables and indices into smaller, more efficient files (note, however, that
>> on a 500Gb database this could take a *looong* time – perhaps test on a
>> backup first?).  The following forum post is a really nice, plain-English
>> explanation of what VACUUM does:
>>
>> http://dba.stackexchange.com/questions/126258/what-is-table-
>> bloating-in-databases
>>
>>
>>
>> As I mentioned, my background is MySQL rather than Postgres, so someone
>> with more specific Postgres experience might like to also chime in here.
>>
>>
>>
>> Cheers, Sam.
>>
>>
>>
>>
>>
>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson=
>> qebo.co.uk@xxxxxxxxxxxxxxxxxxx> on behalf of Brajesh Murari <
>> brajesh.murari@xxxxxxxxx>
>> *Date: *Wednesday, 19 October 2016 at 08:28
>> *To: *Knut Staring <knutst@xxxxxxxxx>
>> *Cc: *DHIS 2 Users list <dhis2-users@xxxxxxxxxxxxxxxxxxx>, DHIS2
>> Developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing Analytic
>>
>>
>>
>> Dear Neeraj,
>>
>>
>>
>> The physical database size doesn't matter much, even the number of
>> records don't matter. In my experience the biggest problem that one can
>> going to run in to is not size, but the number of queries you can handle at
>> a time instance specially during analytic functionality execution. Most
>> probably you should going to have to move to a master/slave configuration
>> of your database, so that the read queries can run against the slaves and
>> the write queries run against the master. However, if you and your database
>> management team are not ready for this than, you can tweak your indexes for
>> the queries you are running to speed up the response times. Also there is a
>> lot of tweaking you can do to the network stack and kernel in Linux where
>> MySQL Server has been installed that will help.Perhaps, I would focus first
>> on your indexes, then have a server admin look at your OS, and if all that
>> doesn't help it might be time to implement a master/slave configuration.
>> The most important scalability factor is RAM. If the indexes of your tables
>> fit into memory and your queries are highly optimized in analytic
>> functionality, you can serve a reasonable amount of requests with a average
>> machine. The number of records do matter, depending of how your tables look
>> like. It's a difference to have a lot of varchar fields or only a couple of
>> ints or longs. The physical size of the database matters as well,  think of
>> backups, for instance. Depending on your engine, your physical db files on
>> grow, but don't shrink, for instance with innodb. So deleting a lot of
>> rows, doesn't help to shrink your physical files. Thus the database size
>> does matter. If you have more than one table with more than a million
>> records, then performance starts indeed to degrade. Indexig is one of the
>> important stand need to take care, If you hit one million records you will
>> get performance problems, if the indices are not set right (for example no
>> indices for fields in "WHERE statements" or "ON conditions" in joins). If
>> you hit 10 million records, you will start to get performance problems even
>> if you have all your indices right. Hardware upgrades - adding more memory
>> and more processor power, especially memory - often help to reduce the most
>> severe problems by increasing the performance again, at least to a certain
>> degree.
>>
>>
>>
>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knutst@xxxxxxxxx> wrote:
>>
>> Just a heads-up that there seems to be a JDBC issue with Postgres 9.6, so
>> perhaps you should try upgrading to 9.5 first.
>>
>>
>>
>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <lars@xxxxxxxxx>
>> wrote:
>>
>>
>>
>> Hi Neeraj,
>>
>>
>>
>> what usually helps to improve runtime is to improve/increase:
>>
>>
>>
>> - ssd (read and write speed)
>>
>> - number of CPUs
>>
>> - using latest postgresql (9.6 claims to have even better indexing
>> performance <https://www.postgresql.org/docs/9.6/static/release-9-6.html>
>> than 9.5)
>>
>> - tuning
>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning>
>> of postgresql
>>
>>
>>
>>
>>
>> regards,
>>
>>
>>
>> Lars
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Lars Helge Øverland
>>
>> Lead developer, DHIS 2
>>
>> University of Oslo
>>
>> Skype: larshelgeoverland
>>
>> lars@xxxxxxxxx
>>
>> http://www.dhis2.org <https://www.dhis2.org/>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
>>
>>
>> --
>>
>> Knut Staring
>>
>> Dept. of Informatics, University of Oslo
>>
>> Norway: +4791880522
>>
>> Skype: knutstar
>>
>> http://dhis2.org
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
>>
>>
>> --
>>
>> Best Regards,
>>
>>
>>
>> Brajesh Murari,
>>
>> Postgraduate, Department of Computer Science and Engineering,
>>
>> Chaudhary Devi Lal University, Sirsa,
>>
>> India.
>>
>>
>>
>> The three basic dimensions of human development: a long and healthy life,
>> access to knowledge, and a decent standard of living.
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
>
> *******************************************
>
> Calle Hedberg
>
> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>
> Tel/fax (home): +27-21-685-6472
>
> Cell: +27-82-853-5352
>
> Iridium SatPhone: +8816-315-19119
>
> Email: calle.hedberg@xxxxxxxxx
>
> Skype: calle_hedberg
>
> *******************************************
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Thanks,
Neeraj Gupta

References