← Back to team overview

maria-developers team mailing list archive

Re: Incorrect format description event skipping in queue_event()

 

Krisitan,

The attached patch doesn't work properly because the format
description event it saves is different from what master has sent. It
has different number_of_event_types and potentially it has garbage (or
incomplete data) in post_header_len.

And regarding the fix to Gtid_log_event that we were talking about:
I've tried to do that but I felt like I don't understand the code
quite well at the moment to do that. So apparently all peek*()
functions should have Format_description added as parameter and it's
easy to pass one from queue_event(), but then e.g. mysql_binlog_send()
will need description event too to pass it to send_event_to_slave()
and is_until_reached() so that those could pass it to peek*(). For
correctness this format description should be read from binlog, right?
That would be pretty non-trivial to do.
Also gtid_state_from_pos() needs Format_descirption, but it will
apparently meet that at the beginning of the binlog file, so it can be
created before it will be actually needed. Although constructor of
Format_description_log_event needs another instance of
Format_description. Should it be just created like
Format_description_log_event(BINLOG_VERSION)?

Pavel

On Thu, Aug 29, 2013 at 6:24 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>
>>> But this must be the same problem with normal replication? Whenever the slave
>>> decides to rotate the relay log, it will write a format description event
>>> created by itself with no following format description created on the
>>> master. So it seems this must work somehow, though I'll frankly admit I do not
>>> understand the details of how this works (do you know?)
>>
>> Yes, I investigated this. During normal replication (when relay log is
>> rotated automatically due to max_size) slave's format description is
>> written at the beginning of the new realy log file, but right after
>> that there's code that if description_event_for_queue->binlog_version
>>>= 4 then it writes description_event_for_queue into relay log too.
>> Also it ensures that the event has created = 0 and artificial_event
>> set to 1. So SQL thread still gets the master's format description and
>> doesn't rollback the transaction.
>
>> When IO thread reconnects to master the first event it receives is
>> Rotate. For Rotate event queue_event() executes process_io_rotate().
>> Inside there if
>> mi->rli.relay_log.description_event_for_queue->binlog_version >= 4 it
>> forcefully replaces description_event_for_queue with new event with
>> binlog_version = 3. Then it does the actual relay log rotation during
>> which description_event_for_queue is not written into the new log file
>> (and it shouldn't as it's not master's at this point anyway). The next
>
> I see.
>
> So one possible solution is to do the same at the reconnect case as what we do
> in relay-log rotate initiated by slave due to size: Write out the
> description_event_for_queue to the relay log with created=0 and artificial=1.
>
> I have attached a patch for this, what do you think?
>
> Do you have the possibility to test if this works (eg. when we get a reconnect
> when the master's description event is incompatible with the slave's)?
>
> [This code really is criminally ugly, even for replication standard. But I do
> not really know how to fix it in any reasonable way :-(]
>
>> Maybe I messed up with testing, I'll try to retest again...
>
> FYI: I took unmodified 10.0-base, commented out those two lines, ran
> rpl.rpl_gtid_reconnect, and it failed.
>
> Thanks,
>
>  - Kristian.
>


Follow ups

References