← Back to team overview

maria-developers team mailing list archive

Re: Documentation about GTID

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> Friendly warning: I've discovered a critical bug in GTID
> implementation https://mariadb.atlassian.net/browse/MDEV-4473. So use

Thanks for digging this up, and even supplying a nice test case!

I've pushed a fix to 10.0-base, and merged it to 10.0.

The logic here is a bit complex, so let me try to explain what is going on.

Normally, when the slave requests to start replication from some GTID G, the
master needs to find the binlog file that contains G, scan through it until it
reaches the event G, and then start sending events to the slave at the point
after the event group of G.

However, suppose that we are using two replication domains 1 and 2, but there
are no events logged in domain 2 for a month. The slave will send its start
gtid position, say 1-1-10000,2-2-500. Since there was nothing logged in domain
2 for one month, it is likely that the binlog file containing 2-2-500 was
purged. So if we tried to locate that purged binlog, the slave would fail to
connect.

But as long as 2-2-500 was the _last_ event logged in domain 2 (which is
likely if it was logged one month ago), then we do not need to find the old
purged binlog file - we can start from the beginning of any later binlog
file. The code to handle this special case in gtid_find_binlog_file() and
contains_all_slave_gtid() was the code with the bug.

The way the code works is to look at the Gtid_list_log_event at the start of
every. This contains the list of the GTIDs with the highest sequence number
logged in previous binlog files, for each (domain_id, server_id). Further, the
list is sorted on domain_id, and the last GTID in each group is the last GTID
logged.

This allows to handle the above-mentioned special case. If 2-2-500 appears in
the Gtid_list_log_event as the last event logged in prior binlogs, then we can
start sending events from the beginning of this binlog, we do not need to go
back further.

However, the code had a bug, it was missing the check that the GTID was the
last one in the group for that domain. So if the Gtid_list_log_event contained
1-1-10000,2-2-500,2-3-600, then the code would select this as the starting
point for 2-2-500. This is wrong, because we need to go back to find 2-3-600
and send this to the slave.

The fix is to add the check that the GTID is the last one in its domain_id
group.

In the bug report, you wondered why there are multiple GTIDs for one domain_id
in Gtid_list_log_event (there is one for each server id).

This is needed to be able to locate any given GTID in the binlogs, without
relying on sequence numbers being strictly increasing. There are a number of
scenarios where this can happen. Even if most of these are undesirable
configurations / user error, it is sure to occur in practice, and I spent a
lot of effort in the design to make sure that GTID will still behave
reasonably in such cases, and not silently corrupt replication.

There is a careful distinction between the slave state (gtid position), which
has only one GTID per domain id, and master binlog state
(Gtid_list_log_event), which has one per (domain_id, server_id)
combination. The latter can accumulate lots of cruft in the form of old, no
longer used server_id's, but it does not matter, as it is not something users
ever need to look at. The former _is_ something the user might want to look
at, and it has the simple format of just one GTID per domain configured by the
user.

(This was BTW a major motivation for redesigning GTID from scratch rather than
taking the MySQL 5.6 version. In MySQL 5.6, they do not make this distiction,
so the user-visible slave GTID position will accumulate cruft in the form of
no longer used server UUIDs, which will hang around basically forever).

Thanks,

 - Kristian.


Follow ups

References