maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12570
Re: f1db957c5da: MDEV-21469: Implement crash-safe binary logging of the user XA
Hi, Sujatha!
Note, this is a review of a combined diff dc92235f21e + f1db957c5da
On Mar 14, Sujatha wrote:
> revision-id: f1db957c5da (mariadb-10.5.2-247-gf1db957c5da)
> parent(s): dc92235f21e
> author: Sujatha <sujatha.sivakumar@xxxxxxxxxxx>
> committer: Andrei Elkin <andrei.elkin@xxxxxxxxxxx>
> timestamp: 2020-12-21 16:10:46 +0200
> message:
>
> MDEV-21469: Implement crash-safe binary logging of the user XA
>
> Make XA PREPARE, XA COMMIT and XA ROLLBACK statements
> crash-safe when --log-bin is specified.
>
> At execution of XA PREPARE, XA COMMIT and XA ROLLBACK their replication
> events are made written into the binary log prior to execution
> of the commands in storage engines.
> In case the server crashes, after writing to binary log but before all
> involved engines have processed the command, the following recovery
> will execute the command's replication events to equalize the states
> of involved engines with that of binlog.
> That applies to all XA PREPARE *group* and XA COMMIT or ROLLBACK.
>
> On the implementation level the recovery time binary log parsing
> is augmented to pay attention to
> the user XA xids to identify the XA transactions' state:s in binary log, and
> eventually match them against their states in engines, see
> MYSQL_BIN_LOG::recover_explicit_xa_prepare().
> In discrepancy cases the outdated state in the engines is corrected with
> resubmitting the transaction prepare group of events, or completion
> ones. The multi-engine partly prepared XA PREPARE case
> the XA is rolled back first.
> The fact of multiple-engine involved is registered into
> Gtid_log_event::flags2 as one bit. The boolean value is
> sufficient and precise to deal with two engines in XA transaction.
> With more than 2 recoverable engines the flag method is still correct though
> may be pessimistic, as it treats all recoverable engines as XA
> participants. So when the number of such Engines exceeds the number
> of prepared engines of the XA that XA is treated
> as partially completed, with all that ensued.
> As an optimization no new bit is allocated in flags2, instead a
> pre-existing ones (of MDEV-742) are reused, observing that
> A. XA "COMPLETE" does not require multi-engine hint for its recovery and that
> B. the MDEV-742 XA-completion bit is not anyhow used by
> XA-PREPARE and its GTID log event.
>
> Notice the multi-engine flagging is superceded by MDEV-21117 extension
> in Gtid log event so this part should be taken from there.
> diff --git a/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> new file mode 100644
> index 00000000000..d0852de2fcc
> --- /dev/null
> +++ b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test
> @@ -0,0 +1,86 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA crash recovery works fine across multiple binary logs.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate an explicit XA transaction. Using debug simulation hold the
"debug sync" != "debug sim" :)
> +# execution of XA PREPARE statement after the XA PREPARE is written to
> +# the binary log. With this the prepare will not be done in engine.
> +# 1 - By executing FLUSH LOGS generate multiple binary logs.
> +# 2 - Now make the server to disappear at this point.
> +# 3 - Restart the server. During recovery the XA PREPARE from the binary
> +# log will be read. It is cross checked with engine. Since it is not
> +# present in engine it will be executed once again.
> +# 4 - When server is up execute XA RECOVER to check that the XA is
> +# prepared in engine as well.
> +# 5 - XA COMMIT the transaction and check the validity of the data.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +#
> +
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/have_debug_sync.inc
> +--source include/have_log_bin.inc
> +
> +RESET MASTER;
> +
> +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb;
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +connect(con1,localhost,root,,);
> +XA START 'xa1';
> +INSERT INTO t1 SET a=1;
> +SET DEBUG_SYNC= "simulate_hang_after_binlog_prepare SIGNAL con1_ready WAIT_FOR con1_go";
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +XA END 'xa1';
> +--send XA PREPARE 'xa1';
> +
> +connection default;
> +SET DEBUG_SYNC= "now WAIT_FOR con1_ready";
> +FLUSH LOGS;
> +FLUSH LOGS;
> +FLUSH LOGS;
> +
> +--source include/show_binary_logs.inc
> +--let $binlog_file= master-bin.000004
> +--let $binlog_start= 4
> +--source include/show_binlog_events.inc
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
I don't see why you'd need to wait first, it'd be simpler to restart right away.
> +EOF
> +
> +--error 0,2013
> +SET DEBUG_SYNC= "now SIGNAL con1_go";
> +--source include/wait_until_disconnected.inc
> +
> +--connection con1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +
> +SELECT * FROM t1;
> +
> +# Clean up.
> +connection default;
> +DROP TABLE t1;
> +
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> new file mode 100644
> index 00000000000..e972e3f09de
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test
> @@ -0,0 +1,98 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The COMMIT should be executed during recovery.
> +# 3 - Check the data in table. Both rows should be present in table.
> +# 4 - Trying to commit 'xa2' should report unknown 'XA' error as COMMIT is
> +# already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
> +EOF
Same. Why do you wait?
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> new file mode 100644
> index 00000000000..14cebbd9b13
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test
> @@ -0,0 +1,119 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that if for some reason an event cannot be applied during
> +# recovery, appropriate error is reported.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the
> +# server so that it is not committed in engine.
> +# 2 - Restart the server. Using debug simulation point make XA COMMIT 'xa2'
> +# execution to fail. The server will resume anyway
> +# to leave the error in the errlog (see "Recovery: Error..").
> +# 3 - Work around the simulated failure with Commit once again
> +# from a connection that turns OFF binlogging.
> +# Slave must catch up with the master.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
master-slave must be always included last
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
why do you need master2 connection?
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CALL mtr.add_suppression("Recovery: Error .Out of memory..");
> +CALL mtr.add_suppression("Crash recovery failed.");
> +CALL mtr.add_suppression("Can.t init tc log");
> +CALL mtr.add_suppression("Aborting");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA COMMIT 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart: --debug-dbug=d,trans_xa_commit_fail
> +EOF
> +
> +connection default;
> +--source include/wait_until_disconnected.inc
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--echo *** must be no 'xa2' commit seen, as it's still prepared:
> +SELECT * FROM t;
> +XA RECOVER;
> +
> +# Commit it manually now to work around the extra binlog record
> +# by turning binlogging OFF by the connection.
> +
> +SET GLOBAL DEBUG_DBUG="";
> +SET SQL_LOG_BIN=0;
> +--error 0
why error 0? What will happen if you don't disable binlog?
> +XA COMMIT 'xa2';
> +SET SQL_LOG_BIN=1;
> +
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +
> +--sync_slave_with_master
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> new file mode 100644
> index 00000000000..7b987c7f29b
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test
> @@ -0,0 +1,95 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA PREPARE be done in binary log and crash the
> +# server so that it is not prepared in engine.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' is
> +# present.
> +# 4 - Commit the XA transaction and verify its correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ditto
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> new file mode 100644
> index 00000000000..9d2c5cce528
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test
> @@ -0,0 +1,117 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA PREPARE transactions are crash safe.
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 3 explicit XA transactions. 'xa1', 'xa2' and 'xa3'.
> +# Using debug simulation hold the execution of second XA PREPARE
> +# statement after the XA PREPARE is written to the binary log.
> +# With this the prepare will not be done in engine.
> +# 1 - For 'xa3' allow the PREPARE statement to be written to binary log and
> +# simulate server crash.
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2' and 'xa3'.
> +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' and 'xa3'
> +# are present along with 'xa1'.
> +# 4 - Commit all the XA transactions and verify their correctness.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
ok, in this test you actually use master2 :)
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +CALL mtr.add_suppression("Found 2 prepared XA transactions");
> +CALL mtr.add_suppression("Found 3 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +use test;
> +xa start 'xa2';
> +insert into t values (30);
> +xa end 'xa2';
> +SET DEBUG_SYNC="simulate_hang_after_binlog_prepare SIGNAL reached WAIT_FOR go";
> +send xa prepare 'xa2';
> +
> +--connection master2
> +let $wait_condition=
> + SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST
> + WHERE STATE like "debug sync point: simulate_hang_after_binlog_prepare%";
> +--source include/wait_condition.inc
eh? you can just 'WAIT_FOR signal', couldn't you?
> +
> +XA START 'xa3';
> +INSERT INTO t VALUES (40);
> +XA END 'xa3';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare";
> +--error 2013 # CR_SERVER_LOST
> +XA PREPARE 'xa3';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--error 2013
> +--reap
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +XA COMMIT 'xa1';
> +XA COMMIT 'xa2';
> +XA COMMIT 'xa3';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> new file mode 100644
> index 00000000000..1d19f96116e
> --- /dev/null
> +++ b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test
> @@ -0,0 +1,97 @@
> +# ==== Purpose ====
> +#
> +# Test verifies that XA COMMIT statements are crash safe.
XA ROLLBACK
> +#
> +# ==== Implementation ====
> +#
> +# Steps:
> +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'.
> +# 'xa1' will be prepared and committed.
> +# 1 - For 'xa2' let the XA ROLLBACK be done in binary log and crash the
> +# server so that it is not committed in engine.
not rolled back
> +# 2 - Restart the server. The recovery code should successfully recover
> +# 'xa2'. The ROLLBACK should be executed during recovery.
> +# 3 - Check the data in table. Only one row should be present in table.
> +# 4 - Trying to rollback 'xa2' should report unknown 'XA' error as rollback
> +# is already complete during recovery.
> +#
> +# ==== References ====
> +#
> +# MDEV-21469: Implement crash-safe logging of the user XA
> +
> +--source include/have_innodb.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_debug.inc
> +
> +connect (master2,localhost,root,,);
unused again
> +--connection master
> +CALL mtr.add_suppression("Found 1 prepared XA transactions");
> +
> +CREATE TABLE t ( f INT ) ENGINE=INNODB;
> +XA START 'xa1';
> +INSERT INTO t VALUES (20);
> +XA END 'xa1';
> +XA PREPARE 'xa1';
> +XA COMMIT 'xa1';
> +--sync_slave_with_master
> +--source include/stop_slave.inc
> +
> +--connection master1
> +XA START 'xa2';
> +INSERT INTO t VALUES (40);
> +XA END 'xa2';
> +XA PREPARE 'xa2';
> +
> +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +wait
ditto
> +EOF
> +
> +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback";
> +--error 2013 # CR_SERVER_LOST
> +XA ROLLBACK 'xa2';
> +--source include/wait_until_disconnected.inc
> +
> +--connection master1
> +--source include/wait_until_disconnected.inc
> +
> +--connection master
> +--source include/wait_until_disconnected.inc
> +
> +#
> +# Server restart
> +#
> +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +restart
> +EOF
> +
> +connection default;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +# rpl_end.inc needs to use the connection server_1
> +connection server_1;
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection master
> +--enable_reconnect
> +--source include/wait_until_connected_again.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +--sync_with_master
> +
> +--connection master
> +SELECT * FROM t;
> +XA RECOVER;
> +--error 1397 # ER_XAER_NOTA
> +XA ROLLBACK 'xa2';
> +SELECT * FROM t;
> +--sync_slave_with_master
> +
> +SELECT * FROM t;
> +
> +--connection master
> +DROP TABLE t;
> +--sync_slave_with_master
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> new file mode 100644
> index 00000000000..c7cefbda4bf
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test
> @@ -0,0 +1,46 @@
> +# MDEV-742, MDEV-21469 XA replication, and xa crash-safe.
> +# Tests prove xa state is recovered to a prepared or completed state
> +# upon post-crash recovery, incl a multi-engine case.
> +
> +--source include/have_rocksdb.inc
> +--source include/have_innodb.inc
> +--source include/have_debug.inc
> +--source include/master-slave.inc
ditto
> +--source include/have_binlog_format_row.inc
> +
> +--connection slave
> +--source include/stop_slave.inc
> +
> +--connection master
> +CALL mtr.add_suppression("Found . prepared XA transactions");
> +CALL mtr.add_suppression("Failed to execute binlog query event");
> +CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=rocksdb;
> +CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=innodb;
> +INSERT INTO t1 SET a = 1;
> +INSERT INTO t2 SET a = 1;
> +
> +--let $xa=xa1
> +--let $when=after_binlog
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--let $xa=xa2
> +--let $when=after_first_engine
> +--let $finally_expected=$xa in the list
> +--source rpl_rocksdb_xa_recover.inc
> +
> +--connection slave
> +--source include/start_slave.inc
> +
> +--connection master
> +--sync_slave_with_master
> +let $diff_tables= master:t1, slave:t1;
> +source include/diff_tables.inc;
> +let $diff_tables= master:t2, slave:t2;
> +source include/diff_tables.inc;
> +
> +--connection master
> +SET SESSION DEBUG_DBUG="";
> +drop table t1,t2;
> +--sync_slave_with_master
> +
> +--source include/rpl_end.inc
> diff --git a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> new file mode 100644
> index 00000000000..312b8b6a19e
> --- /dev/null
> +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc
> @@ -0,0 +1,67 @@
> +# callee of rpl_rocksdb_xa_recover.test
> +# requires t1,t2 as defined in the caller.
> +
> +--let $at=_prepare
> +--let $finally_expected=$xa in the list
> +--let $init_val=`SELECT MAX(a) from t1 as t_1`
why as t_1 ?
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 1
> +--eval INSERT INTO t2 SET a=$init_val + 1
> +--eval XA END '$xa'
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA PREPARE '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval XA ROLLBACK '$xa'
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (rolled back)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 2
> +--eval INSERT INTO t2 SET a=$init_val + 2
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA ROLLBACK '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as zero FROM t1
> +--eval SELECT MAX(a) - $init_val as zero FROM t2
> +
> +
> +--let $at=_commit_or_rollback
> +--let $finally_expected=empty list (committed away)
> +--eval XA START '$xa'
> +--eval INSERT INTO t1 SET a=$init_val + 3
> +--eval INSERT INTO t2 SET a=$init_val + 3
> +--eval XA END '$xa'
> +--eval XA PREPARE '$xa'
> +
> +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at"
> +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
> +--error 2013 # CR_SERVER_LOST
> +--eval XA COMMIT '$xa'
> +--source include/wait_until_disconnected.inc
> +--let $rpl_server_number = 1
> +--source include/rpl_reconnect.inc
> +
> +--echo "*** $finally_expected"
> +XA RECOVER;
> +--eval SELECT MAX(a) - $init_val as three FROM t1
> +--eval SELECT MAX(a) - $init_val as three FROM t2
> diff --git a/sql/log_event.h b/sql/log_event.h
> index 639cbfbe7aa..e1880339e85 100644
> --- a/sql/log_event.h
> +++ b/sql/log_event.h
> @@ -3614,6 +3616,12 @@ class Gtid_log_event: public Log_event
> static const uchar FL_PREPARED_XA= 64;
> /* FL_"COMMITTED or ROLLED-BACK"_XA is set for XA transaction. */
> static const uchar FL_COMPLETED_XA= 128;
> + /*
> + To mark the fact of multiple transactional engine participants
> + in the prepared XA. The FL_COMPLETED_XA bit is reused by XA_PREPARE_LOG_EVENT,
What do you mean by that? XA_prepare_log_event does not inherit from
Gtid_log_event, it cannot use Gtid_log_event flags.
And, please, explain in a comment why you can reuse a bit for two different
flags. Like "the ambiguity between FL_COMPLETED_XA and FL_MULTI_ENGINE_XA is
resolved by the flag FL_PREPARED_XA. if it's set, then 128 means
FL_MULTI_ENGINE_XA, if it's not set, then 128 means FL_COMPLETED_XA"
> + oth the XA completion events do not need such marking.
other
> + */
> + static const uchar FL_MULTI_ENGINE_XA= 128;
>
> #ifdef MYSQL_SERVER
> Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool standalone,
> diff --git a/sql/handler.h b/sql/handler.h
> index 0eff7bd930d..9acdd47bfc2 100644
> --- a/sql/handler.h
> +++ b/sql/handler.h
> @@ -1880,8 +1891,18 @@ class Ha_trx_info
> m_ht= ht_arg;
> m_flags= (int) TRX_READ_ONLY; /* Assume read-only at start. */
>
> - m_next= trans->ha_list;
> - trans->ha_list= this;
> + if (likely(!past_head))
> + {
> + m_next= trans->ha_list;
> + trans->ha_list= this;
> + }
> + else
> + {
> + DBUG_ASSERT(trans->ha_list);
> +
> + m_next= trans->ha_list->m_next;
> + trans->ha_list->m_next= this;
> + }
I don't like this fake generalization. There's no point for an arbitrary
engine to be put at a specific place in the list. Only binlog is special.
At least, remove this new argument and check for binlog_hton here.
But really, think of this - binlog is so special, may be it'd be easier not
to pretend it's a storage engine at all? No need for a fake binlog_hton,
no need to register it, etc. It kind of made sense in 2005, but it doesn't
necessarily make it now.
> }
>
> /** Clear, prepare for reuse. */
> diff --git a/sql/handler.cc b/sql/handler.cc
> index ffb89b30d92..b981294f0d6 100644
> --- a/sql/handler.cc
> +++ b/sql/handler.cc
> @@ -1338,6 +1345,9 @@ int ha_prepare(THD *thd)
> handlerton *ht= ha_info->ht();
> if (ht->prepare)
> {
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_prepare",
> + if (!ha_info->next()) DBUG_SUICIDE(););
looks like not simulate_crash_after_first_engine_prepare, but
more like simulate_crash_before_last_engine_prepare.
it's the same point in time if you have only two engines,
but the code still implements "before last", not "after first"
> +
> if (unlikely(prepare_or_error(ht, thd, all)))
> {
> ha_rollback_trans(thd, all);
> @@ -1368,22 +1378,22 @@ int ha_prepare(THD *thd)
> }
>
> /*
> - Like ha_check_and_coalesce_trx_read_only to return counted number of
> - read-write transaction participants limited to two, but works in the 'all'
> - context.
> - Also returns the last found rw ha_info through the 2nd argument.
> + Returns counted number of
> + read-write recoverable transaction participants optionally limited to two.
> + Also optionally returns the last found rw ha_info through the 2nd argument.
> */
> -uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info)
> +uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info, bool count_through)
I don't understand why you need a new argument. ptr_ha_info==NULL means don't
return a value there, this is a standard convention used everywhere.
also, it's not an "count rw all" it's are there more rw-all htons than N
where N can be 1 or 2.
So:
* either take N as an argument and return as soon as rw_ha_count > N
* or don't overoptimize here and just return full count every time
> {
> unsigned rw_ha_count= 0;
>
> for (auto ha_info= thd->transaction.all.ha_list; ha_info;
> ha_info= ha_info->next())
> {
> - if (ha_info->is_trx_read_write())
> + if (ha_info->is_trx_read_write() && ha_info->ht()->recover)
> {
> - *ptr_ha_info= ha_info;
> - if (++rw_ha_count > 1)
> + if (ptr_ha_info)
> + *ptr_ha_info= ha_info;
> + if (++rw_ha_count > 1 && !count_through)
> break;
> }
> }
> @@ -1876,6 +1886,10 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans, bool is_real_trans)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
1. same as above
2. why not to use *different* names for these two crashes?
in case, you know, you'll want to crash at a specific point in the code, not
just somewhere? And in a case you'll actually want to crash just somewhere you
can always use "+d,simulate_crash_after_first_engine_commit,simulate_crash_after_first_engine_rollback"
(and it should be "before_last", of course, not "after_first")
> +
> if ((err= ht->commit(ht, thd, all)))
> {
> my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> @@ -1987,6 +2001,10 @@ int ha_rollback_trans(THD *thd, bool all)
> {
> int err;
> handlerton *ht= ha_info->ht();
> +
> + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback",
> + if (!ha_info->next()) DBUG_SUICIDE(););
ditto
> +
> if ((err= ht->rollback(ht, thd, all)))
> { // cannot happen
> my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err);
> @@ -2098,8 +2116,24 @@ int ha_commit_or_rollback_by_xid(XID *xid, bool commit)
> xaop.xid= xid;
> xaop.result= 1;
>
> + if (binlog_hton->recover)
> + {
> + /*
> + When the binlogging service is enabled complete the transaction
> + by it first.
> + */
> + if (commit)
> + binlog_hton->commit_by_xid(binlog_hton, xid);
> + else
> + binlog_hton->rollback_by_xid(binlog_hton, xid);
> + }
> plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
> MYSQL_STORAGE_ENGINE_PLUGIN, &xaop);
> + if (binlog_hton->recover)
> + {
> + THD *thd= current_thd;
> + thd->reset_binlog_completed_by_xid();
> + }
1. this, precisely, shows that binlog does not need a hton anymoreo
2. how can binlog_hton->recover be NULL here?
>
> return xaop.result;
> }
> @@ -2222,6 +2258,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
>
> if (hton->recover)
> {
> + info->recover_htons++;
don't forget to subtract 1 for binlog_hton->recover,
otherwise it'll make you to rollback everything with a FL_MULTI_ENGINE_XA flag
> while ((got= hton->recover(hton, info->list, info->len)) > 0 )
> {
> sql_print_information("Found %d prepared transaction(s) in %s",
> @@ -2263,7 +2300,21 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
> _db_doprnt_("ignore xid %s", xid_to_str(buf, info->list[i]));
> });
> xid_cache_insert(info->list + i);
> + XID *foreign_xid= info->list + i;
swap those two lines ^^^ then you can write
xid_cache_insert(foreign_xid);
> info->found_foreign_xids++;
> +
> + /*
> + For each foreign xid prepraed in engine, check if it is present in
> + xa_prepared_list of binlog.
> + */
> + if (info->xa_prepared_list)
for simplicity I'd rather always initialize xa_prepared_list and remove
this if()
> + {
> + struct xa_recovery_member *member= NULL;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(info->xa_prepared_list, foreign_xid->key(),
> + foreign_xid->key_length())))
> + member->in_engine_prepare++;
> + }
> continue;
> }
> if (IF_WSREP(!(wsrep_emulate_bin_log &&
> @@ -2362,7 +2424,7 @@ int ha_recover(HASH *commit_list)
> info.found_my_xids, opt_tc_log_file);
> DBUG_RETURN(1);
> }
> - if (info.commit_list)
> + if (info.commit_list && !info.found_foreign_xids)
Why? after the finished crash recovery one can still have transactions in
doubt.
> sql_print_information("Crash recovery finished.");
> DBUG_RETURN(0);
> }
> diff --git a/sql/log.cc b/sql/log.cc
> index 731bb3e98f0..c69b8518cf4 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -2019,28 +2027,49 @@ static int binlog_xa_recover_dummy(handlerton *hton __attribute__((unused)),
> return 0;
> }
>
> -
> +/*
> + The function invokes binlog_commit() and returns its result
> + when it has not yet called it already.
> + binlog_cache_mngr::completed_by_xid remembers the fact of
> + the 1st of maximum two subsequent calls.
> +*/
> static int binlog_commit_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT);
>
> - return binlog_commit(hton, thd, TRUE);
> -}
> + if (!cache_mngr->completed_by_xid)
On what code path can you have completed_by_xid == false here?
> + {
> + rc= binlog_commit(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
>
> + return rc;
> +}
>
> +/*
> + The function invokes binlog_rollback() and returns its results similarly
> + to a commit branch.
> +*/
> static int binlog_rollback_by_xid(handlerton *hton, XID *xid)
> {
> + int rc= 0;
> THD *thd= current_thd;
> -
> - (void) thd->binlog_setup_trx_data();
> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
>
> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_ROLLBACK ||
> (thd->transaction.xid_state.get_state_code() == XA_ROLLBACK_ONLY));
> - return binlog_rollback(hton, thd, TRUE);
> +
> + if (!cache_mngr->completed_by_xid)
> + {
> + rc= binlog_rollback(hton, thd, TRUE);
> + cache_mngr->completed_by_xid= true;
> + }
> +
> + return rc;
> }
>
>
> @@ -2195,6 +2224,12 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
> if (!all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
> + DBUG_SUICIDE(););
> + DEBUG_SYNC(thd, "simulate_hang_after_binlog_prepare");
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_prepare",
> + DBUG_SUICIDE(););
wouldn't you say there's one DBUG_EXECUTE_IF too many?
Just keep one, "simulate_crash_after_binlog_commit"
> +
> THD_STAGE_INFO(thd, org_stage);
> DBUG_RETURN(error);
> }
> @@ -2298,6 +2333,9 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all)
> cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
> thd->reset_binlog_for_next_statement();
>
> + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback",
same as above, use unique dbug keywords, like
"simulate_crash_after_binlog_rollback" (because it's rollback here)
> + DBUG_SUICIDE(););
> +
> DBUG_RETURN(error);
> }
>
> @@ -5749,6 +5794,14 @@ binlog_cache_mngr *THD::binlog_setup_trx_data()
> DBUG_RETURN(cache_mngr);
> }
>
> +/*
> + XA completion flag resetter for ha_commit_or_rollback_by_xid().
> +*/
> +void THD::reset_binlog_completed_by_xid()
> +{
> + binlog_setup_trx_data()->reset_completed_by_xid();
> +}
I don't think this makes any sense as a THD method
> +
> /*
> Function to start a statement and optionally a transaction for the
> binary log.
> @@ -10333,14 +10386,108 @@ start_binlog_background_thread()
> return 0;
> }
>
> +#ifdef HAVE_REPLICATION
> +/**
> + Auxiliary function for TC_LOG::recover().
> + @returns a successfully created and inserted @c xa_recovery_member
> + into hash @c hash_arg,
> + or NULL.
> +*/
> +static xa_recovery_member*
> +xa_member_insert(HASH *hash_arg, xid_t *xid_arg, xa_binlog_state state_arg,
> + MEM_ROOT *ptr_mem_root)
> +{
> + xa_recovery_member *member= (xa_recovery_member*)
> + alloc_root(ptr_mem_root, sizeof(xa_recovery_member));
> + if (!member)
> + return NULL;
> +
> + member->xid.set(xid_arg);
> + member->state= state_arg;
> + member->in_engine_prepare= 0;
> + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
> +}
> +
> +/* Inserts or updates an existing hash member with a proper state */
> +static bool xa_member_replace(HASH *hash_arg, xid_t *xid_arg, bool is_prepare,
> + MEM_ROOT *ptr_mem_root)
> +{
> + if(is_prepare)
> + {
> + if (!(xa_member_insert(hash_arg, xid_arg, XA_PREPARE, ptr_mem_root)))
> + return true;
> + }
> + else
> + {
> + /*
> + Search if XID is already present in recovery_list. If found
> + and the state is 'XA_PREPRAED' mark it as XA_COMPLETE.
> + Effectively, there won't be XA-prepare event group replay.
> + */
> + xa_recovery_member* member;
> + if ((member= (xa_recovery_member *)
> + my_hash_search(hash_arg, xid_arg->key(), xid_arg->key_length())))
> + {
> + if (member->state == XA_PREPARE)
> + member->state= XA_COMPLETE;
> + }
> + else // We found only XA COMMIT during recovery insert to list
> + {
> + if (!(member= xa_member_insert(hash_arg,
> + xid_arg, XA_COMPLETE, ptr_mem_root)))
> + return true;
> + }
> + }
> + return false;
> +}
> +#endif
>
> +extern "C" uchar *xid_get_var_key(xid_t *entry, size_t *length,
define the first argument as const char*, then you won't need to cast in
my_hash_init, and you can be sure you pass the correct function with the
correct number of arguments, even if my_hash_get_key changes in the future.
and cast entry to xid_t* below explicitly.
> + my_bool not_used __attribute__((unused)))
> +{
> + *length= entry->key_length();
> + return (uchar*) entry->key();
> +}
> +
> +/**
> + Performs recovery based on transaction coordinator log for 2pc. At the
> + time of crash, if the binary log was in active state, then recovery for
> + "implicit" 'xid's and explicit 'XA' transactions is initiated,
> + otherwise merely the gtid binlog state is updated.
> + For 'xid' and 'XA' based recovery the following steps are performed.
> +
> + Identify the active binlog checkpoint file.
> + Scan the binary log from the beginning.
> + From GTID_LIST and GTID_EVENTs reconstruct the gtid binlog state.
> + Prepare a list of 'xid's for recovery.
> + Prepare a list of explicit 'XA' transactions for recovery.
> + Recover the 'xid' transactions.
> + The explicit 'XA' transaction recovery is initiated once all the server
> + components are initialized. Please check 'execute_xa_for_recovery()'.
why explicit XA recovery is delayed?
> +
> + Called from @c MYSQL_BIN_LOG::do_binlog_recovery()
> +
> + @param linfo Store here the found log file name and position to
> + the NEXT log file name in the index file.
> +
> + @param last_log_name Name of the last active binary log at the time of
> + crash.
> +
> + @param first_log Pointer to IO_CACHE of active binary log
> + @param fdle Format_description_log_event of active binary log
> + @param do_xa Is 2pc recovery needed for 'xid's and explicit XA
> + transactions.
> + @return indicates success or failure of recovery.
> + @retval 0 success
> + @retval 1 failure
> +
> +*/
> int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> IO_CACHE *first_log,
> Format_description_log_event *fdle, bool do_xa)
> {
> Log_event *ev= NULL;
> HASH xids;
> - MEM_ROOT mem_root;
> char binlog_checkpoint_name[FN_REFLEN];
> bool binlog_checkpoint_found;
> bool first_round;
> @@ -10533,10 +10693,22 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
>
> if (do_xa)
> {
> - if (ha_recover(&xids))
> + if (ha_recover(&xids, &xa_recover_list, &xa_recover_htons))
> goto err2;
>
> - free_root(&mem_root, MYF(0));
> + DBUG_ASSERT(!xa_recover_list.records ||
> + (binlog_checkpoint_found && binlog_checkpoint_name[0] != 0));
why is that?
> +
> + if (!xa_recover_list.records)
> + {
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> + }
> + else
> + {
> + xa_binlog_checkpoint_name= strmake_root(&mem_root, binlog_checkpoint_name,
> + strlen(binlog_checkpoint_name));
> + }
> my_hash_free(&xids);
> }
> return 0;
> @@ -10561,6 +10734,219 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> return 1;
> }
>
> +void MYSQL_BIN_LOG::execute_xa_for_recovery()
> +{
> + if (xa_recover_list.records)
> + (void) recover_explicit_xa_prepare();
> + free_root(&mem_root, MYF(0));
> + my_hash_free(&xa_recover_list);
> +};
> +
> +/**
> + Performs recovery of user XA transactions.
> + 'xa_recover_list' contains the list of XA transactions to be recovered.
> + with possible replaying replication event from the binary log.
> +
> + @return indicates success or failure of recovery.
> + @retval false success
> + @retval true failure
> +
> +*/
> +bool MYSQL_BIN_LOG::recover_explicit_xa_prepare()
why is it bool, if you only use it in execute_xa_for_recovery()
and ignore the return value?
> +{
> +#ifndef HAVE_REPLICATION
> + /* Can't be supported without replication applier built in. */
> + return false;
> +#else
> + bool err= true;
> + int error=0;
> + Relay_log_info *rli= NULL;
> + rpl_group_info *rgi;
> + THD *thd= new THD(0); /* Needed by start_slave_threads */
> + thd->thread_stack= (char*) &thd;
> + thd->store_globals();
> + thd->security_ctx->skip_grants();
> + IO_CACHE log;
> + const char *errmsg;
> + File file;
> + bool enable_apply_event= false;
> + Log_event *ev = 0;
> + LOG_INFO linfo;
> + int recover_xa_count= xa_recover_list.records;
> + xa_recovery_member *member= NULL;
> +
> + if (!(rli= thd->rli_fake= new Relay_log_info(FALSE, "Recovery")))
> + {
> + my_error(ER_OUTOFMEMORY, MYF(ME_FATAL), 1);
> + goto err2;
> + }
> + rli->sql_driver_thd= thd;
> + static LEX_CSTRING connection_name= { STRING_WITH_LEN("Recovery") };
> + rli->mi= new Master_info(&connection_name, false);
> + if (!(rgi= thd->rgi_fake))
> + rgi= thd->rgi_fake= new rpl_group_info(rli);
> + rgi->thd= thd;
> + thd->system_thread_info.rpl_sql_info=
> + new rpl_sql_thread_info(rli->mi->rpl_filter);
> +
> + if (rli && !rli->relay_log.description_event_for_exec)
> + {
> + rli->relay_log.description_event_for_exec=
> + new Format_description_log_event(4);
> + }
> + if (find_log_pos(&linfo, xa_binlog_checkpoint_name, 1))
> + {
> + sql_print_error("Binlog file '%s' not found in binlog index, needed "
> + "for recovery. Aborting.", xa_binlog_checkpoint_name);
> + goto err2;
> + }
> +
> + tmp_disable_binlog(thd);
> + thd->variables.pseudo_slave_mode= TRUE;
> + for (;;)
> + {
> + if ((file= open_binlog(&log, linfo.log_file_name, &errmsg)) < 0)
> + {
> + sql_print_error("%s", errmsg);
> + goto err1;
> + }
> + while (recover_xa_count > 0 &&
> + (ev= Log_event::read_log_event(&log,
> + rli->relay_log.description_event_for_exec,
> + opt_master_verify_checksum)))
> + {
> + if (!ev->is_valid())
> + {
> + sql_print_error("Found invalid binlog query event %s"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written));
> + goto err1;
> + }
> + enum Log_event_type typ= ev->get_type_code();
> + ev->thd= thd;
> +
> + if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event= true;
> +
> + if (typ == GTID_EVENT)
> + {
> + Gtid_log_event *gev= (Gtid_log_event *)ev;
> + if (gev->flags2 &
> + (Gtid_log_event::FL_PREPARED_XA | Gtid_log_event::FL_COMPLETED_XA))
> + {
> + if ((member=
> + (xa_recovery_member*) my_hash_search(&xa_recover_list,
> + gev->xid.key(),
> + gev->xid.key_length())))
> + {
> + /*
> + When XA PREPARE group of events (as flagged so) check
> + its actual binlog state which may be COMPLETED. If the
> + state is also PREPARED then analyze through
> + in_engine_prepare whether the transaction needs replay.
> + */
> + if (gev->flags2 & Gtid_log_event::FL_PREPARED_XA)
> + {
> + if (member->state == XA_PREPARE)
> + {
> + // XA prepared is not present in (some) engine then apply it
> + if (member->in_engine_prepare == 0)
> + enable_apply_event= true;
> + else if (gev->flags2 & Gtid_log_event::FL_MULTI_ENGINE_XA &&
> + xa_recover_htons > member->in_engine_prepare)
This needs a comment, something like "FL_MULTI_ENGINE_XA says the transaction
must be prepared in more than one engine. We don't know in how many, so if it's
not prepared in *all* engines we'll replay it just in case"
I presume that (a bit silly) logic will do away when you will port this
patch to use a counter instead of a flag. Please, don't push if before
that.
> + {
> + enable_apply_event= true;
> + // partially engine-prepared XA is first cleaned out prior replay
> + thd->lex->sql_command= SQLCOM_XA_ROLLBACK;
> + ha_commit_or_rollback_by_xid(&gev->xid, 0);
> + }
> + else
> + --recover_xa_count;
> + }
> + }
> + else if (gev->flags2 & Gtid_log_event::FL_COMPLETED_XA)
> + {
> + if (member->state == XA_COMPLETE &&
> + member->in_engine_prepare > 0)
> + enable_apply_event= true;
why? you cannot replay a fully prepared and partially committed transaction
> + else
> + --recover_xa_count;
> + }
> + }
> + }
> + }
> +
> + if (enable_apply_event)
> + {
> + if ((err= ev->apply_event(rgi)))
> + {
> + sql_print_error("Failed to execute binlog query event of type: %s,"
> + " at %s:%llu; error %d %s", ev->get_type_str(),
> + linfo.log_file_name,
> + (ev->log_pos - ev->data_written),
> + thd->get_stmt_da()->sql_errno(),
> + thd->get_stmt_da()->message());
> + delete ev;
> + goto err1;
> + }
> + else if (typ == FORMAT_DESCRIPTION_EVENT)
> + enable_apply_event=false;
how can that happen?
> + else if (thd->lex->sql_command == SQLCOM_XA_PREPARE ||
> + thd->lex->sql_command == SQLCOM_XA_COMMIT ||
> + thd->lex->sql_command == SQLCOM_XA_ROLLBACK)
> + {
> + --recover_xa_count;
> + enable_apply_event=false;
> +
> + sql_print_information("Binlog event %s at %s:%llu"
> + " successfully applied",
> + typ == XA_PREPARE_LOG_EVENT ?
> + static_cast<XA_prepare_log_event *>(ev)->get_query() :
> + static_cast<Query_log_event *>(ev)->query,
> + linfo.log_file_name, (ev->log_pos - ev->data_written));
> + }
> + }
> + if (typ != FORMAT_DESCRIPTION_EVENT)
> + delete ev;
> + }
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + file= -1;
> + if (unlikely((error= find_next_log(&linfo, 1))))
> + {
> + if (error != LOG_INFO_EOF)
> + sql_print_error("find_log_pos() failed (error: %d)", error);
> + else
> + break;
> + }
> + }
> +err1:
> + reenable_binlog(thd);
> + /*
> + There should be no more XA transactions to recover upon successful
> + completion.
> + */
> + if (recover_xa_count > 0)
> + goto err2;
> + sql_print_information("Crash recovery finished.");
> + err= false;
> +err2:
> + if (file >= 0)
> + {
> + end_io_cache(&log);
> + mysql_file_close(file, MYF(MY_WME));
> + }
> + thd->variables.pseudo_slave_mode= FALSE;
> + delete rli->mi;
> + delete thd->system_thread_info.rpl_sql_info;
> + rgi->slave_close_thread_tables(thd);
> + thd->reset_globals();
> + delete thd;
> +
> + return err;
> +#endif /* !HAVE_REPLICATION */
> +}
>
> int
> MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx