← Back to team overview

maria-developers team mailing list archive

Re: MariaDB allows for slave to connect with non-existent GTID

 

On Sun, May 5, 2013 at 11:09 PM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> Pavel Ivanov <pivanof@xxxxxxxxxx> writes:
>
>> I've realized that the way slaves are processed now on the master
>> allows them to connect even if they request non-existent GTID.
>
> What happens here is that S3 requests GTID 0-2-3 from S1.
>
> S1 has in binlog: 0-1-1  0-1-2  0-1-3  0-2-4
>
> So there is a "hole" in the binlog of S1, a transaction got missing.
>
> However, the code allows S3 to start replicating with 0-2-4 as the first
> event. Because we can be sure that this is the first event that we _do_ have
> that follows the requested 0-2-3.
>
> Now, if S1 had had only "0-1-1  0-1-2  0-1-3" in the binlog, then S3 would not
> be allowed to connect. Mainly to protect against the case where no further
> 0-2-* events ever appear, which would cause S3 to skip events forever waiting
> for such event.
>
>> Is it
>> "works as intended" and will be different in the "strict mode" or you
>> didn't want for such things to happen even in non-strict mode?
>
> I am not sure. But my immediate impression is that this is the most consistent
> behaviour.
>
> In MariaDB GTID, we keep track of only the last applied GTID (within each
> domain), and rely on binlog sequence being identical between different
> servers. In this particular example we could detect that this was violated,
> but it was kind of accidental. If S3 had been stopped one event earlier or
> later, then we would not be able to detect the error. So catching this error
> case does not really seem to buy much in general.

I'd say if S3 stopped one event earlier then there would have been no
error at all. If S3 stopped one event later then sure it wouldn't be
possible to detect the error, but it will be detected in strict mode.
But what I'm not feeling comfortable with is if S3 is stopped as it is
and if it tries to connect to S1 immediately it will cause error. Also
if there was no failover to S2 and S2 didn't author any new GTIDs then
it will cause error as well. It looks like difference between error
and non-error is very vague and fragile.

> Also, when using stuff like --replicate-wild-ignore-table, holes can easily
> appear, and allowing a slave to connect "in the middle of a hole" seems
> reasonable.

So what you are saying is when stuff like
--replicate-wild-ignore-table is used slave will have holes in binlogs
compared to master. But in that case slaves won't ever have GTID that
is missing on master. But if we have 2nd slave with different table
filtering it will have different holes in binlogs. In this case if we
failover and make this 2nd slave master then it's quite possible that
1st slave will connect to new master with GTID that does not exist
there. I see how this is kind of valid situation from MariaDB point of
view, but I don't see how it makes sense to do this in real life.

So I see your point and I can't argue that this behavior should change
by default (except that it probably won't make any sense for anybody
to use such feature), but we would really like this situation to be
detected and replication to be stopped either in "gtid strict mode" or
in some other mode that we could turn on.

Thank you,
Pavel


Follow ups

References