← Back to team overview

maria-developers team mailing list archive

Re: Incorrect format description event skipping in queue_event()

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> The attached patch doesn't work properly because the format
> description event it saves is different from what master has sent. It
> has different number_of_event_types and potentially it has garbage (or
> incomplete data) in post_header_len.

Aha, so it's because of this, in Format_description_log_event::write() ?

  uchar buff[FORMAT_DESCRIPTION_HEADER_LEN + BINLOG_CHECKSUM_ALG_DESC_LEN];
  size_t rec_size= sizeof(buff);
  memcpy((char*) buff+ST_COMMON_HEADER_LEN_OFFSET + 1, (uchar*) post_header_len,
         LOG_EVENT_TYPES);
  ret= (write_header(file, rec_size) ||
        wrapper_my_b_safe_write(file, buff, rec_size) ||

So it uses the static size of own server version, not the one of the event
that was read. And we get the effect you describe.

So then it seems the fix is to modify the created and/or artificial (log_pos)
fields manually in the buffer containing the master's description event, fix
the checksum, and then write it out normally (not skip)? Or do you have a
better suggestion?

What I do not understand is why we do not have the exact same bug when the
slave rotates the relay log due to max_relay_log_size? It seems to me it does
the exact same thing? It writes out description_event_for_queue which can have
different number_of_event_types.

Maybe the point is that yes, we may have different number of event types and
garbage in post_header_len, but:

 - If the slave knows fewer types than master, the missing types cannot be
   read by slave anyway.

 - If the slave knows more types, then those types cannot be written by the
   master anyway, so the garbage will not be accessed.

... which is horrible, but not much can surprise me anymore about replication
code.

So what do you think?

> quite well at the moment to do that. So apparently all peek*()
> functions should have Format_description added as parameter and it's
> easy to pass one from queue_event(), but then e.g. mysql_binlog_send()
> will need description event too to pass it to send_event_to_slave()
> and is_until_reached() so that those could pass it to peek*(). For
> correctness this format description should be read from binlog, right?
> That would be pretty non-trivial to do.

Isn't it just a matter of reading in each format_description_log_event as it
is seen and holding on to it in some description_event_for_binlog_send? Should
be easy, right?

> Also gtid_state_from_pos() needs Format_descirption, but it will
> apparently meet that at the beginning of the binlog file, so it can be
> created before it will be actually needed. Although constructor of

Yeah, again it should be just a matter of reading the format_description_event
found at the start of the binlog file.

> Format_description_log_event needs another instance of
> Format_description. Should it be just created like
> Format_description_log_event(BINLOG_VERSION)?

Yes, that should be ok, I think. It seems to be only used for
common_header_len, and the value of that is frozen for
Format_description_log_event.

If you are interested in helping with testing this, I can write a patch later,
but I need a way to test it without having to spend unreasonable amounts of
time on it.

 - Kristian.


Follow ups

References