← Back to team overview

maria-developers team mailing list archive

Re: Replicating same server_id problem

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> --replicate-same-server-id flag which as I understand (when set to 0)
> controls two things:
> 1) It doesn't allow slave to connect to a master with the same server_id.
> 2) Slave ignores all binlog events in the replication stream that have
> the same server_id as slave.
> And this flag cannot be set to 1 when --log-slave-updates is used. And
> that is a big problem.

Hm, I was not aware of this. It seems wrong. For (1), I don't think slave
should ever be allowed to connect to server with the same server_id. And for
the reason you mentioned, it seems wrong that --log-slave-updates and
--replicate-same-server-id can not be used together. After all,
--replicate-same-server-id is only a problem in ring topologies.

It does not really seem related to GTID though, the exact same problems would
occur when using old-style replication. Of course an easy work-around is to
change the server id on the restored server S1, but the problem is if one is
not aware of this ahead of time...

On the other hand, in GTID strict mode, the problem of creating a loop does
not exist. Any attempt to binlog an event that is already in the binlog will
cause an error.

So it would make sense to allow --replicate-same-server-id together with
--log-slave-updates when GTID strict mode is enabled. On the other hand, I
would be tempted to just allow the two to be used together freely - users that
want to do ring topologies must in any case be very aware of all the possible
pitfalls.

> What do you think about how this should be fixed? As I understand you
> explicitly wanted to support replication cycles, so you still want the
> skipping of transactions with the same server_id to exist. But the
> situation above is a valid production use case. Maybe in GTID world it
> can be solved better? E.g. if transaction has the same server_id, but
> the GTID wasn't applied yet then it shouldn't be skipped?

The main problem I see is what should be the default? I suppose we cannot
safely change the default for --replicate-same-server-id. On the other hand,
if users explicitly set --replicate-same-server-id=0, then it really does not
seem correct that some events with same server id are nevertheless replicated
depending on some complicated GTID semantics. So the curse of backwards
compatibility seems to hit here...

Maybe in GTID strict mode we could make it an error if we are about to skip an
event with our own server_id that has a higher seq_no than what we have in our
binlog. Then we at least get safe behaviour in strict mode in non-ring
topologies.

With respect to ring topologies, I frankly find them quite dangerous to rely
on, and for now I am mainly concerned with making sure that anything that
worked in 5.5 will continue to work in 10.0.

 - Kristian.


Follow ups

References