maria-developers team mailing list archive

Thread
Date

Re: [External] Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)

To: Simon Mudd <simon.mudd@xxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Tue, 12 Sep 2017 10:21:40 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, andrei.elkin@xxxxxxxxxxx
In-reply-to: <931E0936-7F58-4B31-A2B9-789464CA406B@booking.com> (Simon Mudd's message of "Mon, 11 Sep 2017 08:27:27 +0200")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

Simon Mudd <simon.mudd@xxxxxxxxxxx> writes:

> ids. Obviously once all appropriate bin logs have been purged
> (naturally by other means) then no special processing will be needed.

Right. Hence my original idea (which was unfortunately never implemented so
far). If at some point a domain has been unused for so long that all GTIDs
in that domain are gone, it is relatively safe to pretend that the domain
never existed.

I would like to understand if you can think of significant use cases where
the DBA needs to have active binlog files in the master containing some
domain, while simultaneously pretending that this domain never existed.

Or if it is more of a general concern, and the inconvenience for users to
have to save old binlogs somewhere else than the master's data directory and
binlog index (SHOW BINARY LOGS).

> removing old binary logs should _not_ IMO be done as a way of
> forgetting the past obsolete domains.
> BINLOGS are important so throwing them away is an issue. I think that somehow the code needs
> to be aware of the cut-off point and when the “stale domain ids” are removed.)

I understand the desire to not delete binlog files.

The problem is: If you want to have GTIDs with some domain in your active
binlog files, _and_ you also want to pretend that this domain never existed,
what does it mean? What is the semantics? It creates a lot of complexities
for defining the semantics, for documenting it, for the users to understand
it, and for the code to implement it correctly.

So basically, I do not understand what is the intended meaning of FLUSH
BINARY LOGS DELETE DOMAIN d _and_ at the same time keeping GTIDs with domain
d around in active binlog files? In what respects is the domain deleted, and
in what respects not?

For the master, the binlog files are mainly used to stream to connecting
slaves. Deleting a domain means replacing the conceptual binlog history with
one in which that domain never existed. So that domain will be ignored in a
connecting slaves position, assuming it is served by another multi-source
master. If a new GTID in that domain appears later, it will be considered
the very first GTID ever in that domain.

So consider what happens if there is anyway GTIDs in that domain deeper in
the binlog:

1. An already connected slave may be happily replicating those GTIDs. If
that slave reconnects (temporary network error for example), it will instead
fail with unknown GTID, or perhaps just start silently ignoring all further
GTIDs in that domain. This kind of unpredictable behaviour seems bad.

2. Suppose a slave connects with a position without the deleted domain. The
master starts reading the binlog from some point. What happens if a GTID is
encountered that contains the deleted domain? The slave will start
replicating that domain from some arbitrary point that depends on where it
happened to be in other domains at the last disconnect. This also seems
undesirable.

There may be other scenarios that I did not think about.

> DBAs do not like to remove bin logs “early" as unless you keep a copy
> somewhere you may lose valuable information,
> for recovery, for backups etc. Not everyone will be making automatic
> copies (as MySQL does not provide an automatic way to do this)

Understood. Maybe what is needed is a PURGE BINARY LOGS that removes the
entries from the binlog index (SHOW BINARY LOGS), but leaves the files in
the file system for the convenience of the sysadmin? (Well, you can just
hand-edit binlog.index, but that requires master restart I think).

> The other comment I see mentioned here was “make sure all slaves are
> up to date”. That’s going to be hard. The master can only be
> aware of “connected slaves” and if you have intermediate masters, or a

Indeed, the master cannot ensure this. The idea is that the DBA, who decides
to delete a domain, must understand that this should not be done if any
slave still needs GTIDs from that domain. This is similar to configuring
normal binlog purge, where the DBA needs to ensure that binlogs are kept
long enough for the needs of the slowest slave.

> FWIW expiring old domains is good to do. There’s a similar FR for

> completely different the problem space is the same. Coming up with a
> solution which is simple to use and understand and also
> avoids where that’s possible making mistakes which may break
> replication is good. So thanks for looking at this.

Indeed. And the input from people like you with strong operational
experience is very valuable to end up with a good solution, hence my request
for additional input.

 - Kristian.

Follow ups

Re: [External] Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: Simon Mudd, 2017-09-21

References

Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: andrei . elkin, 2017-09-06
Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: Kristian Nielsen, 2017-09-06
Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: andrei . elkin, 2017-09-08
Re: [External] Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: Simon Mudd, 2017-09-11