← Back to team overview

maria-developers team mailing list archive

Re: Replicating same server_id problem

 

On Mon, Sep 30, 2013 at 11:47 PM, Michael Widenius <monty@xxxxxxxxxxxx> wrote:
> Pavel> Kristian,
> Pavel> Currently MariaDB (as well as MySQL of all previous versions) has a
> Pavel> very big problem related to replicating same server_id. There is
> Pavel> --replicate-same-server-id flag which as I understand (when set to 0)
> Pavel> controls two things:
> Pavel> 1) It doesn't allow slave to connect to a master with the same server_id.
> Pavel> 2) Slave ignores all binlog events in the replication stream that have
> Pavel> the same server_id as slave.
> Pavel> And this flag cannot be set to 1 when --log-slave-updates is used. And
> Pavel> that is a big problem.
>
> Pavel> Consider the following scenario: let's say we have two servers S1
> Pavel> (master) and S2 (slave). Let's say at some moment in time they are
> Pavel> completely in sync and you bring down S2 to take cold backup (you can
> Pavel> even include binlogs in it). Then you bring it back up, S1 is still
> Pavel> master. Now you execute some transactions, then you do a failover,
> Pavel> make S2 master and execute some more transactions.
>
> The above is all ok.
>
> Pavel> Then you bring down
> Pavel> S1, restore it from the backup taken earlier and connect to replicate
> Pavel> from S2 again.
>
> The above is not ok and has never been supported before in MySQL/MariaDB.
>
> What one should do is to use S2 to setup a new S1 or change server id
> on S1.

Unfortunately both advices are unacceptable in highly available
production environments.
- Using S2 to setup a new S1 means we have to bring down database
completely for a prolonged period of time which doesn't line up with
high availability at all.
- Changing server_id for S1 means we have to remember all server ids
that ever were a master for the database. When any master failover and
server restart is a manual process this could be feasible, but in
automated environments this is virtually impossible.

> The reason is that you can't logically get the above to work safe with
> server id's in all scenario's.
>
> An example:
>
> Assume you have a ring-replication or setup between S1 and S2.

I believe the circular replication is ill-advised and it's impossible
to build any sane production system based on it (and I would be glad
to hear about any examples to the contrary). So I would love to see
some flag that disables any possibility of circular replication along
with removing any features that exist only to facilitate such
configuration...

> If you now restore S1 to an older state, you can't know which of the
> events S1 you get from S2 have already been applied.
>
> Here is an example:
>
> A) S1 sends one event S1.1 to S2
> B) backup
> C) S1 sends one event, S1.2 to S2
> D) S2 sends events S2.1, S1.1 and S1.2 to S1
>
> If you restore S1 to state B and start replication, data from D) will
> be sent to S1, but based on servid it's not possible to know that S1.1
> has to be skipped and S1.2 to be executed.
>
> With GTID we can do things better.

Are you suggesting that currently if slaves always connect to master
using GTID something can be implemented that will allow to re-play
binlog events with the same server id without turning on
--replicate-same-server-id flag?

> knielsen> Maybe in GTID strict mode we could make it an error if we are about to skip an
> knielsen> event with our own server_id that has a higher seq_no than what we have in our
> knielsen> binlog. Then we at least get safe behaviour in strict mode in non-ring
> knielsen> topologies.
>
> Wouldn't it be safe to just give a warning that we have found already
> applied events and then skip them?


Thank you,
Pavel


Follow ups

References