maria-developers team mailing list archive

Thread
Date
Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: andrei.elkin@xxxxxxxxxx
Date: Thu, 07 Sep 2017 15:47:42 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, andrei.elkin@xxxxxxxxxxx
In-reply-to: <87lglrd757.fsf@urd.knielsen-hq.org> (Kristian Nielsen's message of "Wed, 06 Sep 2017 20:16:52 +0200")
Organization: Home sweet home
Razorgate-kas: Status: not_detected
Razorgate-kas: Rate: 0
Razorgate-kas: Envelope from:
Razorgate-kas: Version: 5.5.3
Razorgate-kas: LuaCore: 80 2014-11-10_18-01-23 260f8afb9361da3c7edfd3a8e3a4ca908191ad29
Razorgate-kas: Lua profiles 69136 [Nov 12 2014]
Razorgate-kas: Method: none
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)
Kristian, salute.

Let me jump at once to the high-level specification, afterwards I am
remarking on or dwelling into specific parts of the text.

Your last reply made it explicit that you mean totally strict setup
on master (p.(2) of the following list):


KN> (1). We want the master to "forget about the past" with respect to a
    given domain. This is easy. All that is needed is to rotate the
    binlog and omit the domain from the GTID_LIST event at the start of
    the new binlog. Because when the master searches back for a given
    GTID in the binlog, it stops when it sees a GTID_LIST event without
    that domain.

KN> (2). We want to prevent a user accidentally putting the server into an
    inconsistent state with an incorrect DELETE DOMAIN command. This is
    ensured by the requirement that all existing binlog files are free
    of that domain.  Should a slave later, incorrectly, try to access
    that domain, it will receive the wrong error (that it is diverged
    rather than that the necessary binlog file has been purged), but at
    least it _will_ get an error as it should, not silently corrupt
    replication.

In contrast, I thought of a "liberal" setup provisioned by "the user
must know what he is doing". And I did so seeing no other way to help
out MDEV-12012 use case. Indeed, when the undesired domain events
reside in the very last binlog file and history behind the last file
is still important for the user your 4 step strict protocol of

> 1. FLUSH BINARY LOGS, note the new GTID position.

> 2. Ensure that all slaves are past the problematic point with
> MASTER_GTID_WAIT(<pos>). After this, the old errorneous binlog files are no
> longer needed.
>
> 3. PURGE BINARY LOGS to remove the errorneous logs.
>
> 4. FLUSH BINARY LOG DELETE DOMAIN domain

might be equivalent to RESET MASTER as the 'erroneous' log file is last.
That's why I was content without p.3 and with p.4 that does not
necessary error out.

Naturally I am fine with the strictness of 1-4. But I can't say for the
user whether the new unyielding (always erroring out that is) delete domain
FLUSH LOGS would always satisfy.

To dramatize mdev-12012 case with a complication, what if p.2 can't be
not ensured, say, due to another temporarily stopped slave who (for
simplicity) does not care for the being deleted domain?
On one hand we can't purge the master's binlogs (the stopped slave
constraint), on the other the p.4 alone suffices to either slave (though
the stopped one may need reconfiguration to filter out the deleted
domain's events).

If my concern is practical we may consider *optionally* strict
delete domain FLUSH LOGs. The errored out version would maintain a
strict gtid semantics on Master. The liberal one would cover the above
case as well. And the user would be to choose.


> andrei.elkin@xxxxxxxxxx writes:
>
>> Let me propose methods to clean master off unused gtid domains.
>> I would be glad to hear your opinions, dear colleagues.
>
> So a bit of background: The central idea in MariaDB GTID is the sequence of
> events that created the current master state. This is an abstract concept.
> Conceptually, the current state of this server is defined as executing a
> specific sequence of events (in practice it might have been restored from a
> backup or something). Abstractly, the server's binlog is exactly this
> sequence of events (in practice the early part probably no longer exists or
> possibly never did). The sequence is multi-streamed (one stream per domain).
> Everything (in GTID, but also in parallel replication and group commit) is
> based on the assumption that each stream in the binlog sequence is strictly
> ordered, at least on a single given server.
>
> It is important to understand that it is the actual sequence of events that
> matters, conceptually. The actual GTID format of D-S-N is only an
> implementation detail that allows the code to work correctly. The sequence
> is defined by the binlog, not by the particular sequence numbers in GTID or
> other details.
>
> When a slave connects to our master server, it presents its current position
> as a single event within each stream. By the above, this is sufficient to
> reliably find the correct position in the binlog to restart the slave from.
>
> Because MariaDB replication is async, we cannot in general prevent different
> servers from errorneously ending up with different binlog sequence. However,
> we can ensure a consistent view of the sequence on a single server, and we
> can try to detect and flag any inconsistencies between servers as they are
> noticed.
>
> This is why it is necessary to give an error if a slave presents a position
> containing an event that is not in the master's binlog. The master cannot
> know if this is because the slave is ahead (the event in question will
> arrive later on the master), or because replication has diverged (the event
> will never arrive on the master, and the replication position is not well
> defined). It is a central goal in GTID to avoid, as much as possible, silent
> incorrect operation in replication.
>
>
> With that explained, now onto some concrete comments/answers:
>
>> The past default domain-id is actually permanent past from the user
>> perspective in these cases. Its events have been already replicated and
>> none new will be generated and replicated.
>
> But from the point of view of GTID semantics, the binlog sequence is still
> defined by this past, and in an inconsistent (and hence incorrect) way.
>
>> Therefore such domain conceptually may be cleaned away from either the
>> masters and slave states.
>
> So as you say, the errorneous state must be fixed for GTID to work
> correctly. One way is to discard the entire incorrect binlog with RESET
> MASTER. But this discussion is about fixing the binlog in-place, by
> (conceptually) replacing it with a variant which does not contain the
> problematic past.
>
>> The idea looks quite sane, I only could not grasp why presence of being
>> deleted domains in the very first binlog's GTID_LIST_LOG_EVENT list is
>> warrant for throwing error.
>> Maybe we should leave it out to the user, Kristian? That is to decide
>> what domain is garbage regardless of the binlog state history.
>
> DELETE DOMAIN d1 replaces the conceptual binlog sequence with one in which
> domain d1 never existed. If there would be actual binlog files containing
> events in d1, this would be a grave inconsistency.
>
> For example, if an existing slave was still replicating events in d1, if a
> temporary network error caused it to reconnect to the master, it would fail
> to reconnect. A slave without knowledge of d1 replicating might start
> re-applying any events encountered. Basically, after DELETE DOMAIN d1, any
> binlog file containing d1 is invalid and useless, so it seems appropriate to
> require the user to PURGE BINARY LOG them first.

'Invalid and useless' is fair as long as the user opts for the strict
semantics. But his actual practice may demand flexibility, I hope my
example above is relevant.

>
>> SET @@SESSION.gtid_seq_no=18446744073709551615;
>> CREATE TABLE IF NOT EXISTS `table_dummy`;
>> SHOW LOCAL VARIABLES LIKE '%gtid_binlog_pos%';
>> 
>>   11-1-18446744073709551615
>> 
>> SET @@SESSION.gtid_seq_no=0
>> DROP TABLE `table_dummy`;
>> SHOW LOCAL VARIABLES LIKE '%gtid_binlog_pos%';
>> 
>>   11-1-0
>
> Ouch. That's a bug. This should give an error, I think that could lead to
> all kinds of extremely nasty problems :-(

I agree. And  you don't just mean the zero sequence number is bogus, do
you? There must be some reaction on wrap-around itself I believe.

>
>> 1. Leave wrapping around an old domain to the user via running
>>    the queries like above;
>> 2. The binary logger would be made to react on the fact of wrap-around
>>    with binary log rotation ("internal" FLUSH BINARY LOG). And the new
>>    binlog file won't contain the wrapped "away" domain (because there
>>    are no new event group in it of yet).
>
> I am not sure I understand you here. Are you suggesting that the GTID
> sequence wrap-around bug be instead declared a feature, and be documented as
> the way to delete a domain in the binlog? I do not think that is
> appropriate.

Let me highlight it a bit more.
When the domain range gets filled up on Master, it can't just wrap it
around and log on, even correctly starting with the sequence number 1.
In presence of slaves something like your p.2 synchronization would be
required before the domain range could be reset and the number 1 reused.

But the synchronization (with all slaves) makes the domain obsolete. And
your strict semantics would require p.3 purge at time the range becomes
reused (otherwise we would have two binlog files with the same gtid).

Therefore I think the domain wrap-around relates to the old domain
deletion.

>
> As I see it, there are two sides to this.
>
> (1). We want the master to "forget about the past" with respect to a given
> domain. This is easy. All that is needed is to rotate the binlog and omit
> the domain from the GTID_LIST event at the start of the new binlog. Because
> when the master searches back for a given GTID in the binlog, it stops when
> it sees a GTID_LIST event without that domain.
>
> (2). We want to prevent a user accidentally putting the server into an
> inconsistent state with an incorrect DELETE DOMAIN command. This is ensured
> by the requirement that all existing binlog files are free of that domain.
> Should a slave later, incorrectly, try to access that domain, it will
> receive the wrong error (that it is diverged rather than that the necessary
> binlog file has been purged), but at least it _will_ get an error as it
> should, not silently corrupt replication.
>
> I think the requirement is a reasonable one. The domain was configured
> incorrectly, the binlog files containing it cannot be used safely with GTID.
> The procedure to fix it will then be:
>
> 1. FLUSH BINARY LOGS, note the new GTID position.
>
> 2. Ensure that all slaves are past the problematic point with
> MASTER_GTID_WAIT(<pos>). After this, the old errorneous binlog files are no
> longer needed.
>
> 3. PURGE BINARY LOGS to remove the errorneous logs.
>
> 4. FLUSH BINARY LOG DELETE DOMAIN d
>
> It is of course an option to not do (2). Just be aware that this goes
> against the whole philosophy that GTID was designed around - to prioritise
> consistency and "no silent corruption".
>
> Hope this helps. Of course feel free to ask for more details on any point
> that is not clear.
>
>  - Kristian.
>

Thank you for discussing it with me!

Andrei
Follow ups

Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: Kristian Nielsen, 2017-09-07
References

Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: andrei . elkin, 2017-09-06
Re: Obsolete GTID domain delete on master (MDEV-12012, MDEV-11969)
From: Kristian Nielsen, 2017-09-06