← Back to team overview

maria-developers team mailing list archive

Re: Problem with parallel replication in 10.2


Michael Widenius <michael.widenius@xxxxxxxxx> writes:

> I was trying to run a test that fails in the upcoming bb-10.2-jan on
> the normal 10.2 tree, when I noticed this strange issue:
> - Test fails with timeout when running with --debug
> - When looking at the trace file, I notice that we get a duplicate key
> error for the table gtid_slave_post (MyISAM table).  Is this something
> normal ?

Like this:

2016-09-01 10:33:20 140078976283392 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-53' for key 'PRIMARY', Gtid 0-1-52, Internal MariaDB error code: 1942

This happens because the mysql.gtid_slave_pos table is MyISAM (which is
default in mysql-test-run, but not in the normal server install), and
parallel replication needs to roll back a transaction after it has updated
the table. Because of MyISAM, the gtid_slave_pos change cannot be rolled

Maybe parallel replication could in this case manually undo its change in
the table as part of the rollback. It's just a DELETE of the row previously

In any case, currently the fix is to use InnoDB for the table:

--- rpl_skr.test~	2016-09-01 10:27:21.214633498 +0200
+++ rpl_skr.test	2016-09-01 10:35:50.660242337 +0200
@@ -8,6 +8,9 @@
 --connection server_2
 SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
 --source include/stop_slave.inc
+SET sql_log_bin=0;
+ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
+SET sql_log_bin=1;
 SET GLOBAL slave_parallel_threads=10;
 SET GLOBAL slave_parallel_mode='conservative';
 --source include/start_slave.inc

> bb-10.2-jan tree is a working tree for a merge of MariaDB 10.2 and MySQL 5.7
> When running rpl_skr in 10.2 it takes 2 seconds
> When running it in the bb-10.2-jan tree it takes either  a long time
> or we get a timeout.

This is because of errorneous merge. The original code:

    if (waitee_buf_ptr) {

The bb-10.2-jan code:

    if (victim_trx && waitee_buf_ptr) {

So if victim_trx is NULL the waits are not reported to parallel replication
at all, causing the stalls and/or hangs. victim_trx is NULL unless InnoDB
itself detects a deadlock.

I've attached a patch that fixes this, can also be pulled from here:


Or should I push it directly into bb-10.2-jan? This makes the rpl_skr.test
complete correctly in < 1 second.

> This is probably because of the new lock code in lock0lock.cc and
> lock0wait.cc which doesn't break conflicting transaction but instead
> waits for a timeout

The merge appears very rough. Shouldn't the waitee_buf be integrated into
the new DeadlockChecker class? Why is it necessary to thd_report_wait_for()
on internal transactions like here?

    /* m_trx->mysql_thd is NULL if it's an internal trx. So current_thd is used */
    if (err == DB_LOCK_WAIT) {
            ut_ad(wait_for && wait_for->trx);
            wait_for->trx->abort_type = TRX_REPLICATION_ABORT;
            thd_report_wait_for(current_thd, wait_for->trx->mysql_thd);
            wait_for->trx->abort_type = TRX_SERVER_ABORT;

Maybe I should try to write a better patch for integrating this in the new
InnoDB code.

What do you think about changing this to use the async deadlock kill in
background thread, as discussed in this thread?


This would allow to simplify the code in lock0lock.cc, and avoid the locking
hacks in innobase_kill_query()?

 - Kristian.

commit 93346c264c8a618034648ca55ba417f9dcf49340 (HEAD, montyrpl)
Author: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date:   Thu Sep 1 11:13:03 2016 +0200

    Fix bad 10.2 InnoDB merge.
    The thd_report_wait_for() call was incorrectly skipped whenever
    deadlock checker did not find any deadlocks. This effectively disables
    optimistic parallel replication deadlock kills in most cases, causing
    hangs or stalls.
    Signed-off-by: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index 857cb68..fea9d42 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -8077,7 +8077,7 @@ lock_report_waiters_to_mysql(
 	struct thd_wait_reports*	waitee_buf_ptr,	/*!< in: set of trxs */
 	THD*				mysql_thd,	/*!< in: THD */
-	trx_id_t			victim_trx_id)	/*!< in: Trx selected
+	const trx_t*			victim_trx)	/*!< in: Trx selected
 							as deadlock victim, if
 							any */
@@ -8092,7 +8092,7 @@ lock_report_waiters_to_mysql(
 			trx_t *w_trx = p->waitees[i];
 			/*  There is no need to report waits to a trx already
 			selected as a victim. */
-			if (w_trx->id != victim_trx_id) {
+			if (w_trx != victim_trx) {
 				/* If thd_report_wait_for() decides to kill the
 				transaction, then we will get a call back into
 				innobase_kill_query. We mark this by setting
@@ -8153,10 +8153,10 @@ DeadlockChecker::check_and_resolve(const lock_t* lock, const trx_t* trx)
 		victim_trx = checker.search();
 		/* Report waits to upper layer, as needed. */
-		if (victim_trx && waitee_buf_ptr) {
+		if (waitee_buf_ptr) {
-						     victim_trx->id);
+						     victim_trx);
 		/* Search too deep, we rollback the joining transaction only

Follow ups