Re: [External] Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)


Hello, Simon, Kristian.

(The mail was meant to be sent out yesterday, but it got stuck in
my outgoing box).

> Simon Mudd <simon.mudd@xxxxxxxxxxx> writes:
>> ids. Obviously once all appropriate bin logs have been purged
>> (naturally by other means) then no special processing will be needed.
> Right. Hence my original idea (which was unfortunately never implemented so
> far). If at some point a domain has been unused for so long that all GTIDs
> in that domain are gone, it is relatively safe to pretend that the domain
> never existed.
> I would like to understand if you can think of significant use cases where
> the DBA needs to have active binlog files in the master containing some
> domain, while simultaneously pretending that this domain never existed.
> Or if it is more of a general concern, and the inconvenience for users to
> have to save old binlogs somewhere else than the master's data directory and
> binlog index (SHOW BINARY LOGS).
>> removing old binary logs should _not_ IMO be done as a way of
>> forgetting the past obsolete domains.
>> BINLOGS are important so throwing them away is an issue. I think
>> that somehow the code needs
>> to be aware of the cut-off point and when the “stale domain ids” are removed.)

Simon, initially I thought of masking out the problematic domain so that
the most recent binlog file would not have it in its Gtid_list header.
Yet I've given up that idea to have agreed the strict setup on master
weighs much more.

> I understand the desire to not delete binlog files.

And the mdev-12012 use case might even not require to conduct this
purging/flush-delete-domain procedure if IGNORE_DOMAIN_IDS (for the zero
id domain in question) would do. There seems to be MDEV-9108 in the way
though, but conceptually the DBA may have a way to stay with the binlog
files even having a problematic domain.

> The problem is: If you want to have GTIDs with some domain in your active
> binlog files, _and_ you also want to pretend that this domain never existed,
> what does it mean? What is the semantics? It creates a lot of complexities
> for defining the semantics, for documenting it, for the users to understand
> it, and for the code to implement it correctly.
> So basically, I do not understand what is the intended meaning of FLUSH
> BINARY LOGS DELETE DOMAIN d _and_ at the same time keeping GTIDs with domain
> d around in active binlog files? In what respects is the domain deleted, and
> in what respects not?
> For the master, the binlog files are mainly used to stream to connecting
> slaves. Deleting a domain means replacing the conceptual binlog history with
> one in which that domain never existed. So that domain will be ignored in a
> connecting slaves position, assuming it is served by another multi-source
> master. If a new GTID in that domain appears later, it will be considered
> the very first GTID ever in that domain.
> So consider what happens if there is anyway GTIDs in that domain deeper in
> the binlog:
> 1. An already connected slave may be happily replicating those GTIDs. If
> that slave reconnects (temporary network error for example), it will instead
> fail with unknown GTID, or perhaps just start silently ignoring all further
> GTIDs in that domain. This kind of unpredictable behaviour seems bad.
> 2. Suppose a slave connects with a position without the deleted domain. The
> master starts reading the binlog from some point. What happens if a GTID is
> encountered that contains the deleted domain? The slave will start
> replicating that domain from some arbitrary point that depends on where it
> happened to be in other domains at the last disconnect. This also seems
> undesirable.
> There may be other scenarios that I did not think about.
>> DBAs do not like to remove bin logs “early" as unless you keep a copy
>> somewhere you may lose valuable information,
>> for recovery, for backups etc. Not everyone will be making automatic
>> copies (as MySQL does not provide an automatic way to do this)
> Understood. Maybe what is needed is a PURGE BINARY LOGS that removes the
> entries from the binlog index (SHOW BINARY LOGS), but leaves the files in
> the file system for the convenience of the sysadmin? (Well, you can just
> hand-edit binlog.index, but that requires master restart I think).

Like I said above, a filtering solution could be helpful.

>> The other comment I see mentioned here was “make sure all slaves are
>> up to date”. That’s going to be hard. The master can only be
>> aware of “connected slaves” and if you have intermediate masters, or a
> Indeed, the master cannot ensure this. The idea is that the DBA, who decides
> to delete a domain, must understand that this should not be done if any
> slave still needs GTIDs from that domain. This is similar to configuring
> normal binlog purge, where the DBA needs to ensure that binlogs are kept
> long enough for the needs of the slowest slave.
>> FWIW expiring old domains is good to do. There’s a similar FR for
>> completely different the problem space is the same. Coming up with a
>> solution which is simple to use and understand and also
>> avoids where that’s possible making mistakes which may break
>> replication is good. So thanks for looking at this.
> Indeed. And the input from people like you with strong operational
> experience is very valuable to end up with a good solution, hence my request
> for additional input.
>  - Kristian.

I hope clarifications given by Kristian back up the new feature idea.

On the other hand it would be good to have a way for DBA to cope with the likes of
mdev-12012 (there are few reports on this matter) without purging.
I rely on a solution to MDEV-9108 in that regard.