← Back to team overview

maria-developers team mailing list archive

Re: [External] Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)

 

>> If you “forget" the domain on the upstream server what happens if
>> there
>> are downstream slaves?  I think you’ll break replication if they
>> disconnect
>> from this box and try to reconnect. Their GTID information will no
>> longer match.
>> IMO and if I’ve understood correctly this is broken.

It should not break replication. It is allowed for a slave with GTID
position 0-1-100,10-2-200 to connect to a master that has nothing in domain
10, this is normal.

I am not sure what the use-case of replicating DELETE DOMAIN to a slave
would be. Domain deletion does not have a point-in-time property like normal
transactions, so it does not help to have it replicated inline in the event
stream. If it has an effect on a slave, this effect occurs only when the
slave is restarted/reconnected.

>> I really think there’s a need to indicate what domains should be forgotten/ignored

If CHANGE MASTER ... IGNORE_DOMAIN_IDS is fixed to also ignore the extra
domains on master upon connect, it is probably a better way to ignore
domains in many cases. It is persisted (in the slave's master.info), and it
can be set individually for each slave, which is more flexible (what if one
slave needs to ignore a domain but another slave needs to replicate it?).

>    >KN> The procedure to fix it will then be:
>    >> 
>    >> 1. FLUSH BINARY LOGS, note the new GTID position.
>    >> 
>    >> 2. Ensure that all slaves are past the problematic point with
>    >> MASTER_GTID_WAIT(<pos>). After this, the old errorneous binlog files
>    >> are no
>    >> longer needed.
>    >> 3. PURGE BINARY LOGS to remove the errorneous logs.
>    >> 
>    >> 4. FLUSH BINARY LOG DELETE DOMAIN d

So this was what I suggested at some point related to MDEV-12012. But
probably this is not the best suggestion, as I realised later.

1. In MDEV-12012, two independent masters were originally using the same
domain id, so their history looks diverged in terms of GTID. This can be
fixed by injecting a dummy transaction to make them up-to-date with one
another in that domain. Deleting (possibly valuable) part of the history is
not needed.

2. Another case, a slave needs to ignore the part of the history on a master
connected with some domain. IGNORE_DOMAIN_IDS, once fixed, can do this,
again there is no need to delete possibly valuable history on the master.

3. At some point, a domain that was unused for long may no longer appear
anywhere, _except_ in gtid_binlog_state and gtid_slave_pos. This may
eventually clutter the output and be an annoyance. The original idea with
FLUSH BINARY LOGS DELETE DOMAIN was to allow to fix this annoyance by
removing such domains from gtid_binlog_state once they are no longer needed
anywhere.

I am not sure my original suggestion of using PURGE LOGS was ever a good
idea, or is ever needed.

 - Kristian.


Follow ups

References