← Back to team overview

maria-developers team mailing list archive

Re: GTID replication and relay_log.info

 

Kristian,

On Tue, May 21, 2013 at 2:31 PM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
>> GTIDs it's really easy to solve the problem -- rpl_slave_state has
>> GTID of the last executed transaction, so on restart one needs just to
>> ignore all relay logs and re-connect to master with the last GTID. Did
>
> Yes. That is how it's done basically.
>
> Slave restart should be crash safe with MariaDB GTID (for InnoDB DML - MyISAM
> DML or DDL in general is not crash safe of course, as neither MyISAM nor DDL
> is transactional).

It seems to me that what you said above doesn't always work quite like
that. It's possible that there's something wrong with us applying GTID
patch on top of 10.0.1 and maybe I'm missing something crucial (we
didn't apply START SLAVE UNTIL and everything that was after that
yet). But from time to time we see failures of rpl.rpl_gtid_crash test
with the output like this:

CURRENT_TEST: rpl.rpl_gtid_crash
mysqltest: In included file "./include/wait_until_disconnected.inc":
included from .../mysql-test/suite/rpl/t/rpl_gtid_crash.test at line 244:
At line 24: Server failed to dissapear

The result from queries just before the failure was:
< snip >
include/stop_slave.inc
SET GLOBAL debug_dbug="+d,crash_commit_before";
START SLAVE;
INSERT INTO t1 VALUES (5);
include/stop_slave.inc
SET GLOBAL debug_dbug="+d,crash_commit_after";
START SLAVE;
INSERT INTO t1 VALUES (6);
include/stop_slave.inc
SET GLOBAL debug_dbug="+d,inject_crash_before_flush_rli";
START SLAVE;
INSERT INTO t1 VALUES (7);
Timeout in wait_condition.inc for SELECT COUNT(*) = 7 FROM t1
Id User Host db Command Time State Info Progress
2 root localhost:58472 test Query 0 init show full processlist 0.000
4 system user NULL Connect 30 Waiting for master to send event NULL 0.000
include/stop_slave.inc
SET GLOBAL debug_dbug="+d,inject_crash_after_flush_rli";
START SLAVE;
INSERT INTO t1 VALUES (8);

More results from queries before failure can be found in
.../mysql-test/var/log/rpl_gtid_crash.log

 - saving ...
***Warnings generated in error logs during shutdown after running
tests: rpl.rpl_gtid_crash

130524  1:12:44 [Warning] Slave: Duplicate entry '7' for key 'PRIMARY'
Error_code: 1062
130524  1:13:14 [Warning] Slave: Duplicate entry '7' for key 'PRIMARY'
Error_code: 1062


Or another failure with this output:

CURRENT_TEST: rpl.rpl_gtid_crash
mysqltest: In included file "./include/check_slave_param.inc":
included from ./include/check_slave_no_error.inc at line 24:
included from ./include/rpl_end.inc at line 74:
included from .../mysql-test/suite/rpl/t/rpl_gtid_crash.test at line 268:
At line 31: Wrong value for slave parameter

The result from queries just before the failure was:
< snip >
master-bin.000004 532 Gtid 1 574 BEGIN GTID 0-1-6
master-bin.000004 574 Query 1 666 use `test`; INSERT INTO t1 VALUES (5)
master-bin.000004 666 Xid 1 697 COMMIT /* xid=35 */
master-bin.000004 697 Gtid 1 739 BEGIN GTID 0-1-7
master-bin.000004 739 Query 1 831 use `test`; INSERT INTO t1 VALUES (6)
master-bin.000004 831 Xid 1 862 COMMIT /* xid=52 */
master-bin.000004 862 Gtid 1 904 BEGIN GTID 0-1-8
master-bin.000004 904 Query 1 996 use `test`; INSERT INTO t1 VALUES (7)
master-bin.000004 996 Xid 1 1027 COMMIT /* xid=69 */
master-bin.000004 1027 Gtid 1 1069 BEGIN GTID 0-1-9
master-bin.000004 1069 Query 1 1161 use `test`; INSERT INTO t1 VALUES (8)
master-bin.000004 1161 Xid 1 1192 COMMIT /* xid=89 */
master-bin.000004 1192 Gtid 1 1234 GTID 0-1-10
master-bin.000004 1234 Query 1 1342 use `test`; DROP TABLE `t1` /*
generated by server */

**** SHOW RELAYLOG EVENTS on server_1 ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name Pos Event_type Server_id End_log_pos Info
Wrong value for Last_SQL_Errno. Expected '0', got '1062'

More results from queries before failure can be found in
.../mysql-test/var/log/rpl_gtid_crash.log

 - saving ...
***Warnings generated in error logs during shutdown after running
tests: rpl.rpl_gtid_crash

130525  1:07:15 [Warning] Slave: Duplicate entry '8' for key 'PRIMARY'
Error_code: 1062


Can you suggest what debugging information should I gather to
understand what's happening in failures like this? Or maybe you know
off the top of your head what changes were done since 10.0.1 that are
required for crash recovery to work properly?


Thank you,
Pavel


Follow ups

References