maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #08228
Re: Comments/thoughts on patch
-
To:
Jonas Oreland <jonaso@xxxxxxxxxx>
-
From:
Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
-
Date:
Wed, 04 Mar 2015 15:11:53 +0100
-
Cc:
MariaDB Developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
-
In-reply-to:
<CA+rQws5aEhe9YCxyVxv5=-KOdiVkC_qgS5nXS5_C3HBnALbwgw@mail.gmail.com> (Jonas Oreland's message of "Wed, 4 Mar 2015 14:54:19 +0100")
-
User-agent:
Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
Jonas Oreland <jonaso@xxxxxxxxxx> writes:
> hmm...i'm not sure I get it...
>
> is it a bug or a feature that the "rouge" transactions is skipped by Slave2
> in statement based replication, skipping 0-2-3 and 0-2-4 can cause
> arbitrary data drift, right ?
They are not skipped. The bug is in your patch (I think, I did not test it);
those two transactions can be duplicated (executed twice by Slave2). Let me
give the example in more detail:
Let's say Slave2 first connects to Slave1 from the start.
Slave2 executes GTIDs 0-1-1, 0-1-2, 0-2-3, 0-2-4, 0-1-3.
Now we run STOP SLAVE on Slave2, @@gtid_slave_pos=0-1-3.
Later we do START SLAVE on Slave2. Then Slave2 has to resume from the correct
position, which is just after 0-1-3.
But with your patch, I think Slave2 will receive and execute 0-2-4 and 0-1-3
again. This results in duplicate events and possible data drift on Slave2.
Because in your code, you will reach GTID 0-2-3 in the binlog, and compare
against the 0-1-3 requested by Slave2. And since 3==3, you will run
info->gtid_state.remove(gtid). And then the next GTID 0-2-4 will be sent
(incorrectly) to Slave2.
The correct behaviour is to compare 0-2-3 to 0-1-3, see that the server_ids
are different, and skip and _not_ remove from the gtid_state. Then GTID 0-2-4
will be skipped, and only after the correct position 0-1-3 will Server2 start
receiving events.
More generally, if GTIDS D-S1-N1 comes before D-S2-N2 in the binlogs, there is
no guarantee that N1 < N2. Only if S1=S2 can we be sure that N1 < N2. That is
why the server_id checks are needed.
Hope this helps,
- Kristian.
>> Now the binlog on Slave1 contains:
>>
>> GTID 0-1-1
>> GTID 0-1-2
>> GTID 0-2-3
>> GTID 0-2-4
>> GTID 0-1-3
>> GTID 0-1-4
>> GTID 0-1-5
References