maria-developers team mailing list archive

Thread
Date

Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)

To: maria-developers@xxxxxxxxxxxxxxxxxxx, andrei.elkin@xxxxxxxxxxx
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Wed, 06 Sep 2017 20:16:52 +0200
In-reply-to: <87zia724ku.fsf@quad> (andrei elkin's message of "Wed, 06 Sep 2017 19:07:45 +0300")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

andrei.elkin@xxxxxxxxxx writes:

> Let me propose methods to clean master off unused gtid domains.
> I would be glad to hear your opinions, dear colleagues.

So a bit of background: The central idea in MariaDB GTID is the sequence of
events that created the current master state. This is an abstract concept.
Conceptually, the current state of this server is defined as executing a
specific sequence of events (in practice it might have been restored from a
backup or something). Abstractly, the server's binlog is exactly this
sequence of events (in practice the early part probably no longer exists or
possibly never did). The sequence is multi-streamed (one stream per domain).
Everything (in GTID, but also in parallel replication and group commit) is
based on the assumption that each stream in the binlog sequence is strictly
ordered, at least on a single given server.

It is important to understand that it is the actual sequence of events that
matters, conceptually. The actual GTID format of D-S-N is only an
implementation detail that allows the code to work correctly. The sequence
is defined by the binlog, not by the particular sequence numbers in GTID or
other details.

When a slave connects to our master server, it presents its current position
as a single event within each stream. By the above, this is sufficient to
reliably find the correct position in the binlog to restart the slave from.

Because MariaDB replication is async, we cannot in general prevent different
servers from errorneously ending up with different binlog sequence. However,
we can ensure a consistent view of the sequence on a single server, and we
can try to detect and flag any inconsistencies between servers as they are
noticed.

This is why it is necessary to give an error if a slave presents a position
containing an event that is not in the master's binlog. The master cannot
know if this is because the slave is ahead (the event in question will
arrive later on the master), or because replication has diverged (the event
will never arrive on the master, and the replication position is not well
defined). It is a central goal in GTID to avoid, as much as possible, silent
incorrect operation in replication.

With that explained, now onto some concrete comments/answers:

> The past default domain-id is actually permanent past from the user
> perspective in these cases. Its events have been already replicated and
> none new will be generated and replicated.

But from the point of view of GTID semantics, the binlog sequence is still
defined by this past, and in an inconsistent (and hence incorrect) way.

> Therefore such domain conceptually may be cleaned away from either the
> masters and slave states.

So as you say, the errorneous state must be fixed for GTID to work
correctly. One way is to discard the entire incorrect binlog with RESET
MASTER. But this discussion is about fixing the binlog in-place, by
(conceptually) replacing it with a variant which does not contain the
problematic past.

> The idea looks quite sane, I only could not grasp why presence of being
> deleted domains in the very first binlog's GTID_LIST_LOG_EVENT list is
> warrant for throwing error.
> Maybe we should leave it out to the user, Kristian? That is to decide
> what domain is garbage regardless of the binlog state history.

DELETE DOMAIN d1 replaces the conceptual binlog sequence with one in which
domain d1 never existed. If there would be actual binlog files containing
events in d1, this would be a grave inconsistency.

For example, if an existing slave was still replicating events in d1, if a
temporary network error caused it to reconnect to the master, it would fail
to reconnect. A slave without knowledge of d1 replicating might start
re-applying any events encountered. Basically, after DELETE DOMAIN d1, any
binlog file containing d1 is invalid and useless, so it seems appropriate to
require the user to PURGE BINARY LOG them first.

> SET @@SESSION.gtid_seq_no=18446744073709551615;
> CREATE TABLE IF NOT EXISTS `table_dummy`;
> SHOW LOCAL VARIABLES LIKE '%gtid_binlog_pos%';
> 
>   11-1-18446744073709551615
> 
> SET @@SESSION.gtid_seq_no=0
> DROP TABLE `table_dummy`;
> SHOW LOCAL VARIABLES LIKE '%gtid_binlog_pos%';
> 
>   11-1-0

Ouch. That's a bug. This should give an error, I think that could lead to
all kinds of extremely nasty problems :-(

> 1. Leave wrapping around an old domain to the user via running
>    the queries like above;
> 2. The binary logger would be made to react on the fact of wrap-around
>    with binary log rotation ("internal" FLUSH BINARY LOG). And the new
>    binlog file won't contain the wrapped "away" domain (because there
>    are no new event group in it of yet).

I am not sure I understand you here. Are you suggesting that the GTID
sequence wrap-around bug be instead declared a feature, and be documented as
the way to delete a domain in the binlog? I do not think that is
appropriate.

As I see it, there are two sides to this.

(1). We want the master to "forget about the past" with respect to a given
domain. This is easy. All that is needed is to rotate the binlog and omit
the domain from the GTID_LIST event at the start of the new binlog. Because
when the master searches back for a given GTID in the binlog, it stops when
it sees a GTID_LIST event without that domain.

(2). We want to prevent a user accidentally putting the server into an
inconsistent state with an incorrect DELETE DOMAIN command. This is ensured
by the requirement that all existing binlog files are free of that domain.
Should a slave later, incorrectly, try to access that domain, it will
receive the wrong error (that it is diverged rather than that the necessary
binlog file has been purged), but at least it _will_ get an error as it
should, not silently corrupt replication.

I think the requirement is a reasonable one. The domain was configured
incorrectly, the binlog files containing it cannot be used safely with GTID.
The procedure to fix it will then be:

1. FLUSH BINARY LOGS, note the new GTID position.

2. Ensure that all slaves are past the problematic point with
MASTER_GTID_WAIT(<pos>). After this, the old errorneous binlog files are no
longer needed.

3. PURGE BINARY LOGS to remove the errorneous logs.

4. FLUSH BINARY LOG DELETE DOMAIN d

It is of course an option to not do (2). Just be aware that this goes
against the whole philosophy that GTID was designed around - to prioritise
consistency and "no silent corruption".

Hope this helps. Of course feel free to ask for more details on any point
that is not clear.

 - Kristian.

Follow ups

Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: andrei . elkin, 2017-09-08
Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: andrei . elkin, 2017-09-07

References

Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: andrei . elkin, 2017-09-06