← Back to team overview

maria-developers team mailing list archive

Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)

 

Hello.

Let me propose methods to clean master off unused gtid domains.

The issue is quite practical, as a couple of references on the subject
line tell. Either of them narrates a scenario of two default domain-id masters
serving to one slave initially by "legacy" non-gtid protocol.
When later the masters have changed their common domain to different
private ones *and* the slave turns gtid replication ON, it can't connect to
a master due to the gtid protocol.

And the reason is the prior-to-gtid-connect gained slave's gtid position
consisting of gtids generated by the other master obviously (the two
masters never replicated each other) does not fit to the current master
binlog state.

The past default domain-id is actually permanent past from the user
perspective in these cases. Its events have been already replicated and
none new will be generated and replicated.
Therefore such domain conceptually may be cleaned away from either the
masters and slave states.
Once it's done, the gtid-enabled slave will successfully connect to any
master.

The slave state purge is simple SET @@global.gtid_slave_pos to a
sequence free of the purged domain.
The master side binlog state one requires a new "command" unless the user
is happy with RESET MASTER. 
While setting the new gtid binlog state to be old-domain-free
we would like for the new "command" to preserve the existing binlog
files. This could be accomplished as Kristian suggests in MDEV-12012
(I could not find any earlier references to the idea) as a new option
to

    FLUSH BINARY LOGS DELETE DOMAIN d1, d2

 KN> This command would check that the current binlog state is equal to
 KN> the GTID_LIST_LOG_EVENT at the start of the first binary log file,
 KN> within the specified domains. If not, it is an error. But if so,
 KN> the new binary log file created by FLUSH will be written with the
 KN> specified domains omitted from GTID_LIST_LOG_EVENT (and the current
 KN> binlog state updated accordingly).

The idea looks quite sane, I only could not grasp why presence of being
deleted domains in the very first binlog's GTID_LIST_LOG_EVENT list is
warrant for throwing error.
Maybe we should leave it out to the user, Kristian? That is to decide
what domain is garbage regardless of the binlog state history.

While the FLUSH way looks sufficient and robust I could not help to
think over an alternative.
Consider a scenario when a domain's sequence number
got run out of range. While deems unrealistic in practice we can
simulate it with

SET @@SESSION.gtid_domain_id=11;
SET @@SESSION.server_id=1;
SET @@SESSION.gtid_seq_no=18446744073709551615;
/* Exec some dummy loggable query, e.g */
CREATE TABLE IF NOT EXISTS `table_dummy`;
SHOW LOCAL VARIABLES LIKE '%gtid_binlog_pos%';

  11-1-18446744073709551615

SET @@SESSION.gtid_seq_no=0
DROP TABLE `table_dummy`;
SHOW LOCAL VARIABLES LIKE '%gtid_binlog_pos%';

  11-1-0

I've used two gtids to show domain overflow because I also liked
to read the zero of the last gtid as ... there's *nothing* in this
domain. So it's actually a new "namesake" one, replacing the old that is wrapped
around. The 1st group of events created in the new domain - 11-1-1 -
could shadow the old domain's 11-1-1 as well as all the rest of the old
domain from gtid replication protocol. And that means the old domain is actually deleted.
So if my reading of zero is correct the binlog status would be empty instead.

That's how we also can approach the master side old gtid domain purging:

1. Leave wrapping around an old domain to the user via running
   the queries like above;
2. The binary logger would be made to react on the fact of wrap-around
   with binary log rotation ("internal" FLUSH BINARY LOG). And the new
   binlog file won't contain the wrapped "away" domain (because there
   are no new event group in it of yet).

I would be glad to hear your opinions, dear colleagues.

Cheers,

Andrei


Follow ups