maria-developers team mailing list archive

Thread
Date

Re: Fix for TokuDB and parallel replication

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>, MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
From: jocelyn fournier <jocelyn.fournier@xxxxxxxxxxx>
Date: Fri, 9 Dec 2016 12:01:51 +0100
Cc: Rich Prohaska <prohaska7@xxxxxxxxx>
In-reply-to: <87shqc0y2u.fsf@urd.knielsen-hq.org>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Hi Kristian,

I've just tried your tokudb_optimistic_parallel_replication branch, andit behaves very strangely: the SQL thread stop by itself without anyreplication error when the parallel_mode is set to optimistic.


In the error.log :

2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connectedto master 'repl@172.16.4.1:3306',replication starts at GTID position'0-1-482499150'2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized,starting replication in log 'mysql-bin.001443' at position 973807119,relay log './sql-slave-relay-bin.000001' position: 4; GTID position'0-1-482499150'2016-12-09 11:50:29 140259546905344 [Note] Error reading relay logevent: slave SQL thread was killed2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting,read up to log 'mysql-bin.001443', position 1041861449; GTID position0-1-4825090902016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connectedto master 'repl@172.16.4.1:3306',replication starts at GTID position'0-1-482509090'2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized,starting replication in log 'mysql-bin.001443' at position 1041861449,relay log './sql-slave-relay-bin.000001' position: 4; GTID position'0-1-482509090'2016-12-09 11:50:45 140259548117760 [Note] Error reading relay logevent: slave SQL thread was killed2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized,starting replication in log 'mysql-bin.001443' at position 1046751490,relay log './sql-slave-relay-bin.000002' position: 4890750; GTIDposition '0-1-482509269'2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting,replication stopped in log 'mysql-bin.001443' at position 1047219058;GTID position '0-1-482509775'

Switching back to conservative mode, and all is working properly. Anyidea what could be wrong?



Thanks!


Jocelyn Fournier
Founder
M : +33 6 51 21 54 10
https://www.softizy.com
Softizy - At your side to Optimize your PHP / MySQL applications

Le 28/11/2016 à 10:10, Kristian Nielsen a écrit :

Parallel replication so far did not work well with TokuDB, as some people
who tried it found out. I have now pushed to 10.1 some patches to solve the
problems. There are two main fixes:

1. Fix some races where a waiting transaction would miss its wakeup and get
a lock timeout on a waiting row lock, even though the lock was released by
the holding transaction. This fix is due to great work by Rich Prohaska.
This problem is actually not specific to replication, normal transactions on
a master will experience it too. But it hurts replication a lot, since
replication must commit transactions in order, so one stalled transaction
stalls all following transactions as well.

2. Implement the conflict detection and handling necessary for optimistic
parallel replication to work. This basically implements
thd_rpl_deadlock_check() and hton->kill_query methods. This should solve the
problems where optimistic parallel replication with TokuDB breaks with lock
wait timeouts.

If someone wants to test it, I have made a tree available with just these
fixes on top of MariaDB 10.1.19:

   https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replication

The fix should appear in 10.1.20 eventually.

The first part of the patch has also been submitted by Rich to
upstream. When this is (hopefully) merged upstream, and upstream merged into
MariaDB, the MariaDB version of the fix should be replaced with the Percona
one. I tried making the MariaDB version of the fix identical to Rich's pull
request and keeping it in a separate commit, so this should hopefully be
simple to do when the time comes.

  - Kristian.

_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: Fix for TokuDB and parallel replication
From: jocelyn fournier, 2016-12-09
Re: Fix for TokuDB and parallel replication
From: Kristian Nielsen, 2016-12-09

References

Fix for TokuDB and parallel replication
From: Kristian Nielsen, 2016-11-28