← Back to team overview

maria-developers team mailing list archive

Re: Comments/thoughts on patch

 

Jonas Oreland <jonaso@xxxxxxxxxx> writes:

> hmm...i'm not sure I get it...
>
> is it a bug or a feature that the "rouge" transactions is skipped by Slave2
> in statement based replication, skipping 0-2-3 and 0-2-4 can cause
> arbitrary data drift, right ?

They are not skipped. The bug is in your patch (I think, I did not test it);
those two transactions can be duplicated (executed twice by Slave2). Let me
give the example in more detail:

Let's say Slave2 first connects to Slave1 from the start.
Slave2 executes GTIDs 0-1-1, 0-1-2, 0-2-3, 0-2-4, 0-1-3.
Now we run STOP SLAVE on Slave2, @@gtid_slave_pos=0-1-3.

Later we do START SLAVE on Slave2. Then Slave2 has to resume from the correct
position, which is just after 0-1-3.

But with your patch, I think Slave2 will receive and execute 0-2-4 and 0-1-3
again. This results in duplicate events and possible data drift on Slave2.

Because in your code, you will reach GTID 0-2-3 in the binlog, and compare
against the 0-1-3 requested by Slave2. And since 3==3, you will run
info->gtid_state.remove(gtid). And then the next GTID 0-2-4 will be sent
(incorrectly) to Slave2.

The correct behaviour is to compare 0-2-3 to 0-1-3, see that the server_ids
are different, and skip and _not_ remove from the gtid_state. Then GTID 0-2-4
will be skipped, and only after the correct position 0-1-3 will Server2 start
receiving events.

More generally, if GTIDS D-S1-N1 comes before D-S2-N2 in the binlogs, there is
no guarantee that N1 < N2. Only if S1=S2 can we be sure that N1 < N2. That is
why the server_id checks are needed.

Hope this helps,

 - Kristian.

>> Now the binlog on Slave1 contains:
>>
>>   GTID 0-1-1
>>   GTID 0-1-2
>>   GTID 0-2-3
>>   GTID 0-2-4
>>   GTID 0-1-3
>>   GTID 0-1-4
>>   GTID 0-1-5


References