← Back to team overview

maria-developers team mailing list archive

Re: Review of patch for MDEV-4820

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> I took 10.0-base r3685. Started new just bootstrapped server with
> server_id = 1. It has @@global.gtid_binlog_pos,
> @@global.gtid_slave_pos and @@global.gtid_current_pos empty. Then I
> execute
>
> set global gtid_binlog_state = '0-10-10'
>
> After that @@global.gtid_binlog_pos = '0-10-10' as expected, but both
> @@global.gtid_slave_pos and @@global.gtid_current_pos are still empty.
> Because of that server won't be able to replicate from master.
> If I set gtid_binlog_state to '0-1-10' though
> @@global.gtid_current_pos changes to '0-1-10' and everything is fine.

The short answer is that you should just set both gtid_slave_pos and
gtid_binlog_state on the new server.

  SET GLOBAL gtid_binlog_state = '0-10-10';
  SET GLOBAL gtid_slave_state = @@GLOBAL.gtid_binlog_pos;

For the longer answer, let me try to explain:

The gtid_binlog_pos and the gtid_slave_pos are different concepts in
MariaDB. The former is the last GTID logged into the binlog (for each
domain). The latter is the last GTID replicated by the slave.

These become different because on the one hand slave can use
--log-slave-updates=0 (so binlog is not updated), and on the other hand I did
not want to add overhead of updating gtid_slave_pos for every transaction on
the master. So a GTID that goes into one of them may or may not go into the
other.

Now let us set up a slave with

    CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos;

The slave starts replication at the value of gtid_slave_pos. Every replicated
GTID updates gtid_slave_pos, so to switch master we can just point it to the
new host and it will continue from the correct point.

But suppose we promote a new master, and later want the old master to to
become a slave. The old master did not update gtid_slave_pos, so the point at
which to start is the last GTID logged to the binlog, gtid_binlog_pos. Thus to
start the old master replicating a slave one should use:

    SET GLOBAL gtid_slave_pos = @@GLOBAL.gtid_binlog_pos;
    CHANGE MASTER TO master_host= ... , master_use_gtid=slave_pos;

and then things will proceed correctly with the new slave server.

So this is how you should think of the variables. The gtid_slave_pos is the
position at which to start replication for a slave. The gtid_binlog_pos is the
last GTID logged into the binlog.

Now, this creates an asymmetry - to switch a server to replicate from a new
master, the user has to know if the server was a master or a slave before, and
do it differently depending on which it is.

So I wanted to provide a way to avoid this asymmetry, and I implemented CHANGE
MASTER TO master_use_gtid=current_pos for this. In this mode, when the slave
connects, it looks into both the gtid_slave_pos and the gtid_binlog_pos to
decide which of these has the most recent GTID - and then uses that GTID as
the point to start replication at.

If server was a master before, then the last GTID in the binlog will have the
server's own server_id; _and_ the sequence number will be bigger that what is
in the gtid_slave_pos because sequence numbers on a master are always
generated bigger than any seen before. So in this case we use the last GTID in
the binlog to connect to. Otherwise we use the gtid_slave_pos.

So that is _all_ that gtid_current_pos is - it is a way for the server to
guess whether it was a master or a slave before, and act accordingly. A bit of
magic for casual users that do not want to be aware of whether the server they
are setting up as a slave was a slave already before, or a master.

So the point is that if you want to use gtid_current_pos on a newly setup
server, you need to provide correct values for _both_
gtid_binlog_pos/gtid_binlog_state _and_ gtid_slave_pos. Because
gtid_current_pos is the result of combining the two.

> It looks like the problem is in the server_id check in the first loop
> in rpl_slave_state::iterate(). Can it be removed from there?

I think so - in strict mode, the most recent GTID will always be the one with
the highest sequence number, so the server_id check is not needed. On the
other hand, if things are done correctly, the server_id check will make no
difference, as a GTID with different server_id cannot get into the binlog
without also getting into gtid_slave_pos

But for now I have other, more critical things I want to fix first - I think
this is not a critical thing, just setting gtid_slave_pos on the new server
should make things work for you? (else let me know if I missed something).

 - Kristian.


Follow ups

References