← Back to team overview

dhis2-devs team mailing list archive

Re: Migrating from MySQL to PostgreSQL

 

Thank you again!

Yes mysql might be better or not in another environment. Specially if we
consider the number and sizes of files generated by your InnoDB settings.
There is some discussion about how mysql could be affected by filesystem at
http://serverfault.com/questions/29193/what-is-the-best-linux-filesystem-for-mysql-innodb.
There are references for another big discussion on facebook.

This is not a debate of whether Postgres or MySQL is better than the other
for the kind of tasks DHIS requires. No doubts postgres has advanced
features which MySQL doesn't at the moment. I am just adding more things to
consider when doing these comparisons.

Orvalho



On Tue, May 7, 2013 at 9:58 PM, Saptarshi Purkayastha <sunbiz@xxxxxxxxx>wrote:

> *PS: DO NOT COPY THESE AS IS IN YOUR PRODUCTION SYSTEM. THESE ARE
> SPECIFIC TO MY DEVELOPER INSTALLATION*
>
> Yes, these are on Windows and the results may or may not be different on
> Linux.
> The DHIS2 behavior would remain the same in its interaction with the
> database, since we are using the Type4 JDBC drivers
> The tuning is very specific and I've tried these on a few applications and
> reached these numbers for my MySQL and PostgreSQL
>
> *For MySQL:*
> There are many tuning options in my.cnf that I've done from here -
> https://tools.percona.com/wizard
> The following are the useful ones to look from my setup
> # CACHES AND LIMITS #
> tmp_table_size                 = 32M
> max_heap_table_size            = 32M
> query_cache_type               = 0
> query_cache_size               = 0
> max_connections                = 500
> thread_cache_size              = 50
> open_files_limit               = 65535
> table_definition_cache         = 4096
> table_open_cache               = 4096
>
> # INNODB #
> innodb_log_files_in_group      = 2
> innodb_log_file_size           = 256M
> innodb_flush_log_at_trx_commit = 1
> innodb_file_per_table          = 1
> innodb_buffer_pool_size        = 4G
>
> *For PostgreSQL:*
> Used pgtune for this... - http://pgfoundry.org/projects/pgtune
> But also some other changes as well... the important ones are below:
>
> shared_buffers = 2048MB
> effective_cache_size = 3096MB
> default_statistics_target = 800
> synchronous_commit = local
> wal_writer_delay = 600
> full_page_writes = false
> autovacuum = true
> constraint_exclusion = partition
>
>
>
> ---
> Regards,
> Saptarshi PURKAYASTHA
>
> My Tech Blog:  http://sunnytalkstech.blogspot.com
> You Live by CHOICE, Not by CHANCE
>
>
> On 7 May 2013 20:36, Orvalho Augusto <orvaquim@xxxxxxxxx> wrote:
>
>> Wow! Weldone!
>>
>> I do not have a huge database as yours. And yes I have migrated from
>> MySQL to postgreSQL because I felt that the DHIS community does not support
>> enough MySQL. Anyway getting back to your journey I must ask what means
>> tunned MySQL or tunned PostgreSQL? What you tune might dictate the results.
>>
>> Are the MySQL/PostgreSQL run in Windows? The multiple choices of
>> filesystem and more possibilities to tune in Linux can give another picture
>> to this. You have just migrated everything you will need to do it again but
>> it would be nice to test the same in Linux too.
>>
>> Anyway, thank you! This is immensely great.
>>
>> Caveman
>>
>>
>>
>> On Tue, May 7, 2013 at 2:53 PM, Saptarshi Purkayastha <sunbiz@xxxxxxxxx>wrote:
>>
>>> Hello DHIS users and devs,
>>>
>>> Apologies for the long email... Should have probably been multiple
>>> blogpost to reduce its length...
>>>
>>> I recently encountered a situation with a very large implementation of
>>> DHIS2 having problems generating data mart.
>>> Thus no reports were generated and only data entry was being done. I
>>> thought I'd share some of the experiences to solve these issues, so that it
>>> might be useful to other implementers.
>>> Some changes will be needed in the DHIS2 source, so sending this to the
>>> dev list also, where dev-related discussions can follow-up.
>>>
>>> While PostgreSQL is our recommended database, many implementations have
>>> also used MySQL.
>>> My findings clearly highlight that DHIS2 performs much better on
>>> PostgreSQL and there are also some bugs related to MySQL dialect.
>>> Total org units - 26303
>>> Total Monthly datasets - 9
>>> Total Daily dataset - 1
>>> Total Yearly dataset - 3
>>>
>>> The implementation has about 34 million datavalues (non-zero)... but I
>>> pruned it for my benchmarking. I added 1-million datavalues and ran the
>>> data mart.
>>> The results are from my fairly good laptop (quad-core i7; 8GB RAM; tuned
>>> JVM; tuned MySQL 5.5 (4GB RAM); tuned PostgreSQL 9.2 (4GB RAM); 240GB SSD)
>>> Using DHIS2 2.11. When doing MySQL benchmark turned off all services
>>> including postgres and vice versa.
>>> Java Opts:  -Xmx3G -Xms768m -XX:MaxPermSize=512m
>>> Java version: 1.7.0_21 x64
>>> Java vendor: Oracle Corporation OS name: WindowsMySQL = datamart
>>> completed in 3hrs 46min 12sec
>>> PostgreSQL = datamart completed in 2hrs 5min 16sec
>>>
>>> So, it is obvious that PostgreSQL is doing datamart much faster. The
>>> advantages might scale better if larger number of datavalues
>>> One could argue MySQL 5.6 has many performance improvements, I didn't
>>> have time to explore that.
>>> The migration to PostgreSQL has some challenges. Following are steps I
>>> followed:
>>>
>>>  - Take the mysqldump
>>>  - replace bit(1) to tinyint(1) in the SQL file
>>>  - You'll see that column names are camelCase. This is an issue because
>>> postgres will added a double quotes around to get case-sensitivity, which
>>> MySQL by default nicely excludes.
>>> So you'll have to make all column names to lowercase and remove the
>>> quote characters. I did this with a simple java program. There are 150-odd
>>> column names that need changes.
>>>  - Used Navcat premium (trial version or SQLSquirrel also has this
>>> feature). "Data transfer" is the name of the feature that will move data
>>> from MySQL to Postgres
>>>  - In MySQL non-standard use of boolean (which came only a few yrs
>>> back), its converted to smallint in Postgres. I wrote a JDBC program to
>>> change column type from smallint to boolean. A single table example is as
>>> follows that can be made into a looping procedure as well in pure PSQL.
>>> ALTER TABLE indicator ALTER COLUMN annualized TYPE boolean
>>>     USING CASE WHEN annualized = 0 THEN FALSE
>>>            WHEN annualized = 1 THEN TRUE
>>>            ELSE NULL
>>>     END;
>>>  - remove NULL values from minimumvalue column of minmaxdatalement table
>>> DELETE from minmaxdatalement WHERE minimumvalue=NULL
>>>  - remove NULL values from maximumvalue column of minmaxdatalement table
>>> DELETE from minmaxdatalement WHERE maximumvalue=NULL
>>>  - remove NULL values from name column of relationshiptype table
>>> DELETE from relationshiptype WHERE name=NULL
>>>  - blobs to bytea conversion is a mess and I had to truncate. Probably a
>>> JDBC based connector program will do better conversion, but I just
>>> truncated it and accepted the data loss to systemsetting and usersetting :-)
>>> *
>>> Devs*:
>>> We need to make all column names lowercase in hbm.xml files in code.
>>> This will ensure portability and is generally a good practice.
>>> We should also have a convention of using last_updated instead of
>>> lastUpdated in column names, as is the common practice.
>>>
>>> ---
>>> Regards,
>>> Saptarshi PURKAYASTHA
>>>
>>> My Tech Blog:  http://sunnytalkstech.blogspot.com
>>> You Live by CHOICE, Not by CHANCE
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>

References