maria-developers team mailing list archive
Mailing list archive
Re: Incorrect format description event skipping in queue_event()
Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>> But this must be the same problem with normal replication? Whenever the slave
>> decides to rotate the relay log, it will write a format description event
>> created by itself with no following format description created on the
>> master. So it seems this must work somehow, though I'll frankly admit I do not
>> understand the details of how this works (do you know?)
> Yes, I investigated this. During normal replication (when relay log is
> rotated automatically due to max_size) slave's format description is
> written at the beginning of the new realy log file, but right after
> that there's code that if description_event_for_queue->binlog_version
>>= 4 then it writes description_event_for_queue into relay log too.
> Also it ensures that the event has created = 0 and artificial_event
> set to 1. So SQL thread still gets the master's format description and
> doesn't rollback the transaction.
> When IO thread reconnects to master the first event it receives is
> Rotate. For Rotate event queue_event() executes process_io_rotate().
> Inside there if
> mi->rli.relay_log.description_event_for_queue->binlog_version >= 4 it
> forcefully replaces description_event_for_queue with new event with
> binlog_version = 3. Then it does the actual relay log rotation during
> which description_event_for_queue is not written into the new log file
> (and it shouldn't as it's not master's at this point anyway). The next
So one possible solution is to do the same at the reconnect case as what we do
in relay-log rotate initiated by slave due to size: Write out the
description_event_for_queue to the relay log with created=0 and artificial=1.
I have attached a patch for this, what do you think?
Do you have the possibility to test if this works (eg. when we get a reconnect
when the master's description event is incompatible with the slave's)?
[This code really is criminally ugly, even for replication standard. But I do
not really know how to fix it in any reasonable way :-(]
> Maybe I messed up with testing, I'll try to retest again...
FYI: I took unmodified 10.0-base, commented out those two lines, ran
rpl.rpl_gtid_reconnect, and it failed.
=== modified file 'sql/log.cc'
--- sql/log.cc 2013-08-23 12:02:13 +0000
+++ sql/log.cc 2013-08-29 13:04:28 +0000
@@ -4806,12 +4806,23 @@ int MYSQL_BIN_LOG::new_file_impl(bool ne
-bool MYSQL_BIN_LOG::append(Log_event* ev)
- bool error = 0;
+ bool res;
+ res= append_no_lock(ev);
+ return res;
+bool MYSQL_BIN_LOG::append_no_lock(Log_event* ev)
+ bool error = 0;
DBUG_ASSERT(log_file.type == SEQ_READ_APPEND);
Log_event::write() is smart enough to use my_b_write() or
@@ -4829,7 +4840,6 @@ bool MYSQL_BIN_LOG::append(Log_event* ev
if (my_b_append_tell(&log_file) > max_size)
signal_update(); // Safe as we don't call close
=== modified file 'sql/log.h'
--- sql/log.h 2013-08-23 12:02:13 +0000
+++ sql/log.h 2013-08-29 13:05:53 +0000
@@ -712,6 +712,7 @@ class MYSQL_BIN_LOG: public TC_LOG, priv
bool appendv(const char* buf,uint len,...);
bool append(Log_event* ev);
+ bool append_no_lock(Log_event* ev);
void mark_xids_active(ulong cookie, uint xid_count);
void mark_xid_done(ulong cookie, bool write_checkpoint);
=== modified file 'sql/slave.cc'
--- sql/slave.cc 2013-08-23 08:16:43 +0000
+++ sql/slave.cc 2013-08-29 13:09:31 +0000
@@ -5308,6 +5308,26 @@ static int queue_event(Master_info* mi,c
+ if ((uchar)buf[EVENT_TYPE_OFFSET] == FORMAT_DESCRIPTION_EVENT &&
+ s_id == mi->master_id)
+ If we write this master's description event in the middle of an event
+ group due to GTID reconnect, SQL thread will think that master crashed
+ in the middle of the group and roll back the first half, so we must not.
+ But we still have to write an artificial copy of the masters description
+ event, to override the initial slave-version description event so that
+ SQL thread has the right information for parsing the events it reads.
+ rli->relay_log.description_event_for_queue->created= 0;
+ if (rli->relay_log.append_no_lock
+ error= ER_SLAVE_RELAY_LOG_WRITE_FAILURE;
if ((s_id == global_system_variables.server_id &&