maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09927
Re: Problem with parallel replication in 10.2
Hi!
<cut>
> 2016-09-01 10:33:20 140078976283392 [ERROR] Slave SQL: Error during XID COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 1062: Duplicate entry '0-53' for key 'PRIMARY', Gtid 0-1-52, Internal MariaDB error code: 1942
>
> This happens because the mysql.gtid_slave_pos table is MyISAM (which is
> default in mysql-test-run, but not in the normal server install), and
> parallel replication needs to roll back a transaction after it has updated
> the table. Because of MyISAM, the gtid_slave_pos change cannot be rolled
> back.
Sorry about the myisam part. I did say in #maria at once after I sent
the email that the problem with 10.2 was wrong used engine, but
apparently you missed that.
> Maybe parallel replication could in this case manually undo its change in
> the table as part of the rollback. It's just a DELETE of the row previously
> inserted.
That could be a solution. In any case, I should at least look at
adding a better error message if this happens.
> In any case, currently the fix is to use InnoDB for the table:
>
> --- rpl_skr.test~ 2016-09-01 10:27:21.214633498 +0200
> +++ rpl_skr.test 2016-09-01 10:35:50.660242337 +0200
> @@ -8,6 +8,9 @@
> --connection server_2
> SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
> --source include/stop_slave.inc
> +SET sql_log_bin=0;
> +ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
> +SET sql_log_bin=1;
> SET GLOBAL slave_parallel_threads=10;
> SET GLOBAL slave_parallel_mode='conservative';
> --source include/start_slave.inc
Yes, same fix that I did.
>> bb-10.2-jan tree is a working tree for a merge of MariaDB 10.2 and MySQL 5.7
>>
>> When running rpl_skr in 10.2 it takes 2 seconds
>> When running it in the bb-10.2-jan tree it takes either a long time
>> or we get a timeout.
>
> This is because of errorneous merge. The original code:
>
> if (waitee_buf_ptr) {
> lock_report_waiters_to_mysql(waitee_buf_ptr,
> start_mysql_thd,
> victim_trx_id);
>
> The bb-10.2-jan code:
>
> if (victim_trx && waitee_buf_ptr) {
> lock_report_waiters_to_mysql(waitee_buf_ptr,
> start_mysql_thd,
> victim_trx->id);
>
> So if victim_trx is NULL the waits are not reported to parallel replication
> at all, causing the stalls and/or hangs. victim_trx is NULL unless InnoDB
> itself detects a deadlock.
>
> I've attached a patch that fixes this, can also be pulled from here:
>
> https://github.com/knielsen/server/commits/montyrpl
>
> Or should I push it directly into bb-10.2-jan? This makes the rpl_skr.test
> complete correctly in < 1 second.
Thanks a lot for the patch!
Jani will pull it into his working tree
>> This is probably because of the new lock code in lock0lock.cc and
>> lock0wait.cc which doesn't break conflicting transaction but instead
>> waits for a timeout
>
> The merge appears very rough. Shouldn't the waitee_buf be integrated into
> the new DeadlockChecker class? Why is it necessary to thd_report_wait_for()
> on internal transactions like here?
>
> /* m_trx->mysql_thd is NULL if it's an internal trx. So current_thd is used */
> if (err == DB_LOCK_WAIT) {
> ut_ad(wait_for && wait_for->trx);
> wait_for->trx->abort_type = TRX_REPLICATION_ABORT;
> thd_report_wait_for(current_thd, wait_for->trx->mysql_thd);
> wait_for->trx->abort_type = TRX_SERVER_ABORT;
> }
> return(err);
>
> Maybe I should try to write a better patch for integrating this in the new
> InnoDB code.
It would be great if you could help Jan with a better patch!
He still has a lot of merge work to do and the whole server team is
waiting on Jan to be ready so that we can add the final touches and
release MariaDB 10.2-beta.
> What do you think about changing this to use the async deadlock kill in
> background thread, as discussed in this thread?
>
> https://lists.launchpad.net/maria-developers/msg09902.html
>
> This would allow to simplify the code in lock0lock.cc, and avoid the locking
> hacks in innobase_kill_query()?
I was trying to find the exact patch/patches you are referring to.
https://github.com/knielsen/server/commit/841ada8c8ac39c024cd1eafe4b346deecbe48ca3
https://github.com/knielsen/server/commit/b256733df2cf9f10d38e44ca4979843a3b0d1884
Is it possible for you to create a clean patch for async deadlock for
bb-10.2-jan
that Jan and I can review and apply?
Regards and thanks,
Monty
Follow ups
References