← Back to team overview

maria-developers team mailing list archive

Replicating same server_id problem



>>>>> "Pavel" == Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

Pavel> Kristian,
Pavel> Currently MariaDB (as well as MySQL of all previous versions) has a
Pavel> very big problem related to replicating same server_id. There is
Pavel> --replicate-same-server-id flag which as I understand (when set to 0)
Pavel> controls two things:
Pavel> 1) It doesn't allow slave to connect to a master with the same server_id.
Pavel> 2) Slave ignores all binlog events in the replication stream that have
Pavel> the same server_id as slave.
Pavel> And this flag cannot be set to 1 when --log-slave-updates is used. And
Pavel> that is a big problem.

Pavel> Consider the following scenario: let's say we have two servers S1
Pavel> (master) and S2 (slave). Let's say at some moment in time they are
Pavel> completely in sync and you bring down S2 to take cold backup (you can
Pavel> even include binlogs in it). Then you bring it back up, S1 is still
Pavel> master. Now you execute some transactions, then you do a failover,
Pavel> make S2 master and execute some more transactions.

The above is all ok.

Pavel> Then you bring down
Pavel> S1, restore it from the backup taken earlier and connect to replicate
Pavel> from S2 again.

The above is not ok and has never been supported before in MySQL/MariaDB.

What one should do is to use S2 to setup a new S1 or change server id
on S1.

The reason is that you can't logically get the above to work safe with
server id's in all scenario's.

An example:

Assume you have a ring-replication or setup between S1 and S2.

If you now restore S1 to an older state, you can't know which of the
events S1 you get from S2 have already been applied.

Here is an example:

A) S1 sends one event S1.1 to S2
B) backup
C) S1 sends one event, S1.2 to S2
D) S2 sends events S2.1, S1.1 and S1.2 to S1

If you restore S1 to state B and start replication, data from D) will
be sent to S1, but based on servid it's not possible to know that S1.1
has to be skipped and S1.2 to be executed.

With GTID we can do things better.

knielsen> Maybe in GTID strict mode we could make it an error if we are about to skip an
knielsen> event with our own server_id that has a higher seq_no than what we have in our
knielsen> binlog. Then we at least get safe behaviour in strict mode in non-ring
knielsen> topologies.

Wouldn't it be safe to just give a warning that we have found already
applied events and then skip them?


Follow ups