← Back to team overview

maria-developers team mailing list archive

Re: Review of patch for MDEV-4820

 

Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> Have I misunderstood what you said?

Yes, totally :-(

> By that you are basically saying: I need to support those
> who set up lousy multi-master replication, thus I won't treat
> domain_id as a single replication stream, I'll treat a pair
> (domain_id, server_id) as a single replication stream (meaning there
> could be several replication streams within one domain_id). So

Absolutely not! This would be total breakage of the whole design.

The whole foundation of MariaDB GTID is that for each domain id, we have a
well-defined binlog order that _must_ be the same on every server in the
replication hierarchy. This is what allows to represent the slave position as
a single GTID per domain id.

And that is why we provide the gtid_strict_mode, which enforces globally
monotonic sequence numbers. Because if sequence numbers are monotonic
everywhere, then it is impossible to have different binlog orders on different
servers.

But even without globally monotonic sequence numbers, you can still have the
same binlog order on every server. And GTID can still work correctly. But it
becomes the user's responsibility to ensure that the binlog order is the same
on all servers in the same hierarchy.

So in gtid strict mode it works exactly as you (and I) want. It will not hurt
you that it also works in non-strict mode, which you will not use anyway. What
is so hard to understand about this?

> Here is your words from an earlier email in this thread:

>>>> This decision was a hard one to make and I spent considerable thought on this
>>>> point quite early. It is true that this design reduces possibilities to detect
>>>> some kinds of errors, like missing events and alternate futures.
>
>>>> I can understand if this design is not optimal for what you are trying to
>>>> do. However, implementing two different designs (eg. based on value of
>>>> gtid_strict_mode) is not workable. I believe at least for the first version,
>>>> the current design is what we have to work with.

> So if we don't want to tolerate lost transactions and alternate
> futures and want things to break whenever such events happen, stock
> MariaDB cannot do it for us by design.

Ok, sorry about this, I can see how this could be misunderstood. I was trying
to explain too many things at once and got things mixed up. All I meant here
is that the code needs to use a bit more complex algorithms to correctly
detect errors, not that such detection is impossible or should not be
done. And that your patch similarly needs to be written in a different way,
not that it would be impossible to do.

I think to get this discussion back on track, you need to forget everything
you think I said so far, and instead accept the following points:

1. I completely agree with you on how things should work in strict mode.
Binlog events should always have monotonic sequence numbers, and no lost
transactions or alternate futures are acceptable.

2. MariaDB GTID also supports some non-strict usage. This is not allowed to
break point (1), so you do not need to worry about it if you are happy to use
strict mode.

3. I agree that the issues you report in MDEV-4820 are bugs and I will fix
them.

Once we are on the same track here, we can discuss the finer details on how I
make things work in non-strict mode, and why that is a desirable thing to do,
if you like. But if you think I'm saying something to contradict points 1-3
above, you have misunderstood me.

Ok?

----

(A couple more answers to less important questions:)

> This is surprising. Test doesn't check the particular message text.
> How does it fail?

Just that the test case has a suppression for the error message about
alternate future. Current 10.0-base gives a differently worded error
message. So a different suppression is needed.

> Apparently I failed to reproduce the problem scenarios in the testing
> environment. Sorry, I didn't try to run it on unchanged code. But did
> you try the manual reproduction steps I mentioned in MDEV-4820?

Yes, with exactly the same results. I found the one bug I've mentioned, and
the rest passed.

> This is where I disagree. You keep insisting that this information is
> necessary.

It is necessary in the sense that the code was written under the assumption
that this information is there.

And it is necessary to be able to correctly locate an arbitrary GTID in a
binlog that was written in non-strict mode.

But now that I have thought about it, I think you are right that they are not
needed if the entire binlog was written obeying strict mode. It seems we can
always detect errors in this case. The point is: I did not consider the
possibility that user would manually remove the binlog on a master when I
wrote the code. So I did not want to promise that it is supported until I
thought the problem through.

>> You could even have an
>> aggregating slave A on the third level that uses multi-master to replicate
>> each schema from each second-level slave back to a single server. M->S1->A and
>> M->S2->A.

> I would think actually third-level slave will break in such situation
> because MariaDB doesn't allow GTIDs with the same (domain_id,
> server_id) pair to be out of order, right? But with such replication
> setup third-level slave can get a bunch of events from S1 first and
> then it won't be able to get any events from S2 because they'll have
> smaller seq_no than already present in the binlog.

Thanks, yes you are right, GTID will not work correctly on A or any slave of
A in this setup.

And that is exactly the point of gtid_strict_mode. A setup such as this
requires extra configuration (separate domain_id) to work correctly with
GTID. So we provide strict mode to be able to give an error when the
configuration is incorrect. But we have to have this off by default to not
break upgrades.

 - Kristian.


Follow ups

References