← Back to team overview

maria-developers team mailing list archive

Re: d9913834ceb: MDEV-14014 Multi-Slave Replication Fail: bogus data in log event


Sergei, hello.

> Hi, andrei.elkin!
> On May 30, andrei.elkin@xxxxxxxxxx wrote:
>> revision-id: d9913834cebe4ed8494b3d76762cd882788a8fd5 (mariadb-10.1.33-28-gd9913834ceb)
>> parent(s): c1698e8dc50f0c342c1a6886426ba1d43396cd1e
>> author: Andrei Elkin
>> committer: Andrei Elkin
>> timestamp: 2018-05-30 20:06:44 +0300
>> message:
>> MDEV-14014 Multi-Slave Replication Fail: bogus data in log event
>> MDEV-7257 made a dump thread to read from binlog concurrently with
>> writers as long as the read bytes are below a water-mark
>> (MYSQL_BIN_LOG::binlog_end_pos). However it appeared to be possible a
>> dump thread reader reach out for bytes past the water mark through a
>> feature of IO_CACHE that fills in the internal buffer and while doing
>> so it could read what the reader is not supposed to see (the bytes
>> above MYSQL_BIN_LOG::binlog_end_pos).
>> The issue is fixed with constraining the IO_CACHE buffer fill to respect
>> the watermark.
>> An added test simulates potentially unconstrained buffer fill and an
>> assert guards this is not the case anymore.
> What about a much simpler one-liner fix:
> --- a/sql/sql_repl.cc
> +++ b/sql/sql_repl.cc
> @@ -2549,6 +2549,7 @@ static int send_events(binlog_send_info *info, IO_CACHE*
>        return 1;
>      info->last_pos= linfo->pos;
> +    log->end_of_file= end_pos;

That's nice!
I could not guess out that so need parameter was always at my disposal.

>      error= Log_event::read_log_event(log, packet, info->fdev,
>                         opt_master_verify_checksum ? info->current_checksum_alg
>                                                    : BINLOG_CHECKSUM_ALG_OFF);
> It doesn't change IO_CACHE structure and doesn't add any
> overhead to the IO_CACHE.

Very true, and even more - 'end_of_file' of the read cache seems to have
found usability beyond being a constant.

> Did you already test that your commit actually fixes the issue?

Like I said, I did not try after we found it's NFS binlog. I've done it
right now, but my attempt lasted minutes while in the user's case
apart of anything specific they needed hours sometimes.

I am committing a new patch which sustains the simulation test of the
old one.



Follow ups