maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #10169
Re: Fix for TokuDB and parallel replication
Ok, to make it clear, here are the actions related to the log entries :
*** Server startup : ***
2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected
to master 'repl@172.16.4.1:3306',replication starts at GTID position
'0-1-482499150'
2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread initialized,
starting replication in log 'mysql-bin.001443' at position 973807119,
relay log './sql-slave-relay-bin.000001' position: 4; GTID position
'0-1-482499150'
stop slave; set global slave_parallel_mode=optimistic;
2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log
event: slave SQL thread was killed
2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting,
read up to log 'mysql-bin.001443', position 1041861449; GTID position
0-1-482509090
*** start slave; ***
2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected
to master 'repl@172.16.4.1:3306',replication starts at GTID position
'0-1-482509090'
2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread initialized,
starting replication in log 'mysql-bin.001443' at position 1041861449,
relay log './sql-slave-relay-bin.000001' position: 4; GTID position
'0-1-482509090'
*** SQL thread stopping by itself : ***
2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log
event: slave SQL thread was killed
(no message "Slave SQL thread exiting...")
Associated show slave status :
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.4.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.001443
Read_Master_Log_Pos: 1047027234
Relay_Log_File: sql-slave-relay-bin.000002
Relay_Log_Pos: 4890750
Relay_Master_Log_File: mysql-bin.001443
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB: sc_2,percona
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1046751490
Relay_Log_Space: 5166796
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-1-482509567
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
1 row in set (0.00 sec)
*** start slave; ***
2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread initialized,
starting replication in log 'mysql-bin.001443' at position 1046751490,
relay log './sql-slave-relay-bin.000002' position: 4890750; GTID
position '0-1-482509269'
*** SQL thread stopping by itself ***
This time no slave "SQL thread was killed" message.
2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting,
replication stopped in log 'mysql-bin.001443' at position 1047219058;
GTID position '0-1-482509775'
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.16.4.1
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.001444
Read_Master_Log_Pos: 6700262
Relay_Log_File: sql-slave-relay-bin.000002
Relay_Log_Pos: 5358318
Relay_Master_Log_File: mysql-bin.001443
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB: sc_2,percona
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1047219058
Relay_Log_Space: 38586524
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Slave_Pos
Gtid_IO_Pos: 0-1-482514272
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
1 row in set (0.00 sec)
HTH,
Jocelyn Fournier
Founder
M : +33 6 51 21 54 10
https://www.softizy.com
Softizy - At your side to Optimize your PHP / MySQL applications
Le 09/12/2016 à 12:01, jocelyn fournier a écrit :
Hi Kristian,
I've just tried your tokudb_optimistic_parallel_replication branch,
and it behaves very strangely: the SQL thread stop by itself without
any replication error when the parallel_mode is set to optimistic.
In the error.log :
2016-12-09 11:48:05 140259564768000 [Note] Slave I/O thread: connected
to master 'repl@172.16.4.1:3306',replication starts at GTID position
'0-1-482499150'
2016-12-09 11:48:05 140259546905344 [Note] Slave SQL thread
initialized, starting replication in log 'mysql-bin.001443' at
position 973807119, relay log './sql-slave-relay-bin.000001' position:
4; GTID position '0-1-482499150'
2016-12-09 11:50:29 140259546905344 [Note] Error reading relay log
event: slave SQL thread was killed
2016-12-09 11:50:29 140259564768000 [Note] Slave I/O thread exiting,
read up to log 'mysql-bin.001443', position 1041861449; GTID position
0-1-482509090
2016-12-09 11:50:43 140259546905344 [Note] Slave I/O thread: connected
to master 'repl@172.16.4.1:3306',replication starts at GTID position
'0-1-482509090'
2016-12-09 11:50:43 140259548117760 [Note] Slave SQL thread
initialized, starting replication in log 'mysql-bin.001443' at
position 1041861449, relay log './sql-slave-relay-bin.000001'
position: 4; GTID position '0-1-482509090'
2016-12-09 11:50:45 140259548117760 [Note] Error reading relay log
event: slave SQL thread was killed
2016-12-09 11:52:03 140259547511552 [Note] Slave SQL thread
initialized, starting replication in log 'mysql-bin.001443' at
position 1046751490, relay log './sql-slave-relay-bin.000002'
position: 4890750; GTID position '0-1-482509269'
2016-12-09 11:52:05 140259547511552 [Note] Slave SQL thread exiting,
replication stopped in log 'mysql-bin.001443' at position 1047219058;
GTID position '0-1-482509775'
Switching back to conservative mode, and all is working properly. Any
idea what could be wrong?
Thanks!
Jocelyn Fournier
Founder
M : +33 6 51 21 54 10
https://www.softizy.com
Softizy - At your side to Optimize your PHP / MySQL applications
Le 28/11/2016 à 10:10, Kristian Nielsen a écrit :
Parallel replication so far did not work well with TokuDB, as some
people
who tried it found out. I have now pushed to 10.1 some patches to
solve the
problems. There are two main fixes:
1. Fix some races where a waiting transaction would miss its wakeup
and get
a lock timeout on a waiting row lock, even though the lock was
released by
the holding transaction. This fix is due to great work by Rich Prohaska.
This problem is actually not specific to replication, normal
transactions on
a master will experience it too. But it hurts replication a lot, since
replication must commit transactions in order, so one stalled
transaction
stalls all following transactions as well.
2. Implement the conflict detection and handling necessary for
optimistic
parallel replication to work. This basically implements
thd_rpl_deadlock_check() and hton->kill_query methods. This should
solve the
problems where optimistic parallel replication with TokuDB breaks
with lock
wait timeouts.
If someone wants to test it, I have made a tree available with just
these
fixes on top of MariaDB 10.1.19:
https://github.com/knielsen/server/tree/tokudb_optimistic_parallel_replication
The fix should appear in 10.1.20 eventually.
The first part of the patch has also been submitted by Rich to
upstream. When this is (hopefully) merged upstream, and upstream
merged into
MariaDB, the MariaDB version of the fix should be replaced with the
Percona
one. I tried making the MariaDB version of the fix identical to
Rich's pull
request and keeping it in a separate commit, so this should hopefully be
simple to do when the time comes.
- Kristian.
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-developers
More help : https://help.launchpad.net/ListHelp
Follow ups
References