← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis2-users] 25 hours in completing Analytic

 

Hi Calle

I think Lars would probably know better, but given the kinds of cross
tabulation that is happening with analytics, I doubt that it will scale
linearly.  Would be good to get some empirical data but I think you are
probably going to have something more approaching n^2 time complexity.

On 24 October 2016 at 09:20, Calle Hedberg <calle.hedberg@xxxxxxxxx> wrote:

> Neeraj,
>
> You never stated the number of records you have in the datavalue table -
> what is it?
>
> In the same context: anybody have a rough idea of how many datavalue
> records there are in the global DATIM database - which I think currently
> might be the largest DHIS2 instance around?
>
> Given our own recent work on performance + what Neeraj has reported, I've
> been thinking of creating one test instance with let us say 500 mill
> datavalue records and another with let us say 1 billion, then use them to
> identify key bottlenecks in various processes AND use them to ensure that
> DHIS2 analytics performance is as linear as possible in terms of database
> size. Postgresql has introduced a number of new indexing algorithms in
> recent versions, and I'm not sure if DHIS2 is taking full advantage of them.
>
> Best regards
> Calle
>
> On 24 October 2016 at 07:53, Brajesh Murari <brajesh.murari@xxxxxxxxx>
> wrote:
>
>> Congratulation Neeraj and team ...it much appreciated
>>
>> On Mon, Oct 24, 2016 at 11:08 AM, Neeraj Gupta <neeraj.hisp@xxxxxxxxx>
>> wrote:
>>
>>> Dear Team, Thanks for all your suggestions.
>>>
>>> Now the time of analytic is reduced to 10 hours 41 minutes.
>>>
>>> We tried to VACUUM as Sam suggested but it didn't help then we upgraded
>>> postgres from 9.4 to 9.5.4 and as Calle and Bob suggested we made some
>>> changes in configuration file of postgres and it reduced the time. But the
>>> database size is still same.
>>>
>>> Thanks for all your help!
>>>
>>> Thanks,
>>> Neeraj
>>>
>>> On Wed, Oct 19, 2016 at 6:46 PM, Calle Hedberg <calle.hedberg@xxxxxxxxx>
>>> wrote:
>>>
>>>> Neeraj,
>>>>
>>>> It's always an element of uncertainty linked to database sizes - ref
>>>> Sam's post over. So indicating the number of records you have in the
>>>> datavalue table & key meta-data tables would be useful + indicating whether
>>>> you are running other instances on the same server. Some comments - I've
>>>> been doing a lot of similar optimising work recently:
>>>>
>>>> 1. Upgrading to 9.5.4 is strongly recommended (and don't use 9.6 before
>>>> the worst bugs are fixed and it has stabilised).
>>>>
>>>> 2. Carefully check your postgres.conf against the recommended settings.
>>>> The guide is a bit superficial in the sense that it has recommended "fixed"
>>>> values only and no explanations around ranges below or above those, but you
>>>> can experiment a bit yourself (e.g. the recommended "max_connections = 200"
>>>> might not be sufficient for a really large system like what you have.
>>>>
>>>> 3. If your server is running that single instance only, then 48GB or
>>>> RAM should be sufficient. Our servers are all having 128GB RAM so we
>>>> experimented quite a bit earlier this year with giving a DHIS2 instance
>>>> large amounts or RAM (up to 60-70gb), with negligible impact on
>>>> performance. According to Lars, the DHIS2 cannot really utilize more than
>>>> around 16gb RAM (at least that is how I understood his communication at the
>>>> time). So 48GB should be sufficient for a single instance.
>>>>
>>>> 4. I've been doing performance optimizing recently on an instance with
>>>> - 4-core server with 2x 512gb ssd, 12gb allocated to DHIS2
>>>> - 31,000 Orgunits
>>>> - 420 data elements
>>>> - 250 indicators
>>>> - around 100 mill datavalue records
>>>> - total size around 140gb with analytics tables.
>>>>
>>>> So the size is only 25% of your 500GB, but RUNNING ANALYTICS ON THAT
>>>> DATABASE INSTANCE IS TAKING JUST OVER 1 HOUR. Fundamentally, if the
>>>> analytics engine is designed well, I would expect a nearly linear
>>>> relationship between database size and the time analytics takes to run. So
>>>> running analytics on your database on our server should in theory take 4-5
>>>> hours.
>>>>
>>>> We are obviously comparing oranges and nectarines here, in the sense
>>>> that there might be other aspects of our server and database that is
>>>> different from yours (type of CPU, no of OUs, no of DEs/Indicators, whether
>>>> your instance have lots of tracker data, etc etc). I have not seen any
>>>> scientific/quantified comparative performance values related to specific
>>>> parameters like number of CPUs and/or number of cores, but 12 cores SHOULD
>>>> improve analytics performance quite a bit - assuming around 30% then it
>>>> means running analytics on your database/server should take around 3
>>>> hours......
>>>>
>>>> I tried getting comparative, quantitative data on various
>>>> configurations of hardware and software (e.g. some users prefer CentOS,
>>>> others Ubuntu) during the academy in August, but did not get much - it
>>>> seems most users/providers have found a setup that works for them for now
>>>> and nobody is doing any systematic performance testing (some of the
>>>> international NGOs/companies using DHIS2 might have, but as with internally
>>>> developed apps they are not that keen on sharing). So it would be highly
>>>> appreciated if you would post the results on analytics time with every
>>>> upgrade / tweak you do - starting with the upgrade to Pg 9.5.4
>>>>
>>>> Best regards
>>>> Calle
>>>>
>>>> On 19 October 2016 at 13:28, Sam Johnson <samuel.johnson@xxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi Neeraj,
>>>>>
>>>>>
>>>>>
>>>>> *Using VACUUM and ANALYZE*
>>>>>
>>>>>
>>>>>
>>>>> Like Brajesh, my background is MySQL, and one database admin task that
>>>>> is often overlooked in MySQL is OPTIMIZE TABLEs.  This reclaims unused
>>>>> space (we’ve had 100Gb databases files drop to half their size) and
>>>>> refreshes index statistics (if the shape of your data has changed over
>>>>> time, this can make indices run faster).
>>>>>
>>>>>
>>>>>
>>>>> I’m new to PostgreSQL, but the core principles are the same, and a
>>>>> quick bit of Googling shows that the equivalents in PostgreSQL are the
>>>>> VACUUM and ANALYZE commands.  If your database isn’t set to automatically
>>>>> do VACUUMs (the default DHIS2 postgres config doesn’t seem to be), you
>>>>> might want to try VACUUM *FULL*, which will literally rewrite all of
>>>>> your database tables and indices into smaller, more efficient files (note,
>>>>> however, that on a 500Gb database this could take a *looong* time –
>>>>> perhaps test on a backup first?).  The following forum post is a really
>>>>> nice, plain-English explanation of what VACUUM does:
>>>>>
>>>>> http://dba.stackexchange.com/questions/126258/what-is-table-
>>>>> bloating-in-databases
>>>>>
>>>>>
>>>>>
>>>>> As I mentioned, my background is MySQL rather than Postgres, so
>>>>> someone with more specific Postgres experience might like to also chime in
>>>>> here.
>>>>>
>>>>>
>>>>>
>>>>> Cheers, Sam.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From: *Dhis2-users <dhis2-users-bounces+samuel.johnson=
>>>>> qebo.co.uk@xxxxxxxxxxxxxxxxxxx> on behalf of Brajesh Murari <
>>>>> brajesh.murari@xxxxxxxxx>
>>>>> *Date: *Wednesday, 19 October 2016 at 08:28
>>>>> *To: *Knut Staring <knutst@xxxxxxxxx>
>>>>> *Cc: *DHIS 2 Users list <dhis2-users@xxxxxxxxxxxxxxxxxxx>, DHIS2
>>>>> Developers <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
>>>>> *Subject: *Re: [Dhis2-users] [Dhis2-devs] 25 hours in completing
>>>>> Analytic
>>>>>
>>>>>
>>>>>
>>>>> Dear Neeraj,
>>>>>
>>>>>
>>>>>
>>>>> The physical database size doesn't matter much, even the number of
>>>>> records don't matter. In my experience the biggest problem that one can
>>>>> going to run in to is not size, but the number of queries you can handle at
>>>>> a time instance specially during analytic functionality execution. Most
>>>>> probably you should going to have to move to a master/slave configuration
>>>>> of your database, so that the read queries can run against the slaves and
>>>>> the write queries run against the master. However, if you and your database
>>>>> management team are not ready for this than, you can tweak your indexes for
>>>>> the queries you are running to speed up the response times. Also there is a
>>>>> lot of tweaking you can do to the network stack and kernel in Linux where
>>>>> MySQL Server has been installed that will help.Perhaps, I would focus first
>>>>> on your indexes, then have a server admin look at your OS, and if all that
>>>>> doesn't help it might be time to implement a master/slave configuration.
>>>>> The most important scalability factor is RAM. If the indexes of your tables
>>>>> fit into memory and your queries are highly optimized in analytic
>>>>> functionality, you can serve a reasonable amount of requests with a average
>>>>> machine. The number of records do matter, depending of how your tables look
>>>>> like. It's a difference to have a lot of varchar fields or only a couple of
>>>>> ints or longs. The physical size of the database matters as well,  think of
>>>>> backups, for instance. Depending on your engine, your physical db files on
>>>>> grow, but don't shrink, for instance with innodb. So deleting a lot of
>>>>> rows, doesn't help to shrink your physical files. Thus the database size
>>>>> does matter. If you have more than one table with more than a million
>>>>> records, then performance starts indeed to degrade. Indexig is one of the
>>>>> important stand need to take care, If you hit one million records you will
>>>>> get performance problems, if the indices are not set right (for example no
>>>>> indices for fields in "WHERE statements" or "ON conditions" in joins). If
>>>>> you hit 10 million records, you will start to get performance problems even
>>>>> if you have all your indices right. Hardware upgrades - adding more memory
>>>>> and more processor power, especially memory - often help to reduce the most
>>>>> severe problems by increasing the performance again, at least to a certain
>>>>> degree.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 19, 2016 at 12:35 PM, Knut Staring <knutst@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>> Just a heads-up that there seems to be a JDBC issue with Postgres 9.6,
>>>>> so perhaps you should try upgrading to 9.5 first.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 19, 2016 at 8:58 AM, Lars Helge Øverland <lars@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hi Neeraj,
>>>>>
>>>>>
>>>>>
>>>>> what usually helps to improve runtime is to improve/increase:
>>>>>
>>>>>
>>>>>
>>>>> - ssd (read and write speed)
>>>>>
>>>>> - number of CPUs
>>>>>
>>>>> - using latest postgresql (9.6 claims to have even better indexing
>>>>> performance
>>>>> <https://www.postgresql.org/docs/9.6/static/release-9-6.html> than
>>>>> 9.5)
>>>>>
>>>>> - tuning
>>>>> <https://dhis2.github.io/dhis2-docs/master/en/implementer/html/install_server_setup.html#install_postgresql_performance_tuning>
>>>>> of postgresql
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>>
>>>>> Lars
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Lars Helge Øverland
>>>>>
>>>>> Lead developer, DHIS 2
>>>>>
>>>>> University of Oslo
>>>>>
>>>>> Skype: larshelgeoverland
>>>>>
>>>>> lars@xxxxxxxxx
>>>>>
>>>>> http://www.dhis2.org <https://www.dhis2.org/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Knut Staring
>>>>>
>>>>> Dept. of Informatics, University of Oslo
>>>>>
>>>>> Norway: +4791880522
>>>>>
>>>>> Skype: knutstar
>>>>>
>>>>> http://dhis2.org
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best Regards,
>>>>>
>>>>>
>>>>>
>>>>> Brajesh Murari,
>>>>>
>>>>> Postgraduate, Department of Computer Science and Engineering,
>>>>>
>>>>> Chaudhary Devi Lal University, Sirsa,
>>>>>
>>>>> India.
>>>>>
>>>>>
>>>>>
>>>>> The three basic dimensions of human development: a long and healthy
>>>>> life, access to knowledge, and a decent standard of living.
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *******************************************
>>>>
>>>> Calle Hedberg
>>>>
>>>> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>>>>
>>>> Tel/fax (home): +27-21-685-6472
>>>>
>>>> Cell: +27-82-853-5352
>>>>
>>>> Iridium SatPhone: +8816-315-19119
>>>>
>>>> Email: calle.hedberg@xxxxxxxxx
>>>>
>>>> Skype: calle_hedberg
>>>>
>>>> *******************************************
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Neeraj Gupta
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Best Regards,
>>
>> Brajesh Murari,
>> Postgraduate, Dept of CSE,
>> Chaudhary Devi Lal University, Sirsa,
>> India.
>>
>> The three basic dimensions of human development: a long and healthy life,
>> access to knowledge, and a decent standard of living.
>>
>
>
>
> --
>
> *******************************************
>
> Calle Hedberg
>
> 46D Alma Road, 7700 Rosebank, SOUTH AFRICA
>
> Tel/fax (home): +27-21-685-6472
>
> Cell: +27-82-853-5352
>
> Iridium SatPhone: +8816-315-19119
>
> Email: calle.hedberg@xxxxxxxxx
>
> Skype: calle_hedberg
>
> *******************************************
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp
>
>

References