← Back to team overview

maria-developers team mailing list archive

Re: Review of patch for MDEV-4820


On Tue, Aug 13, 2013 at 3:26 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
>> Note that the patch I've attached have test case that should reproduce the problems.
> Thanks, I've now gone through the testcases also. Let me number the individual
> tests as follows:
> 1. Check that gap in seq_no without binlogs doesn't allow to replicate
> 2. The same test but with starting master GTID from different server_id
> 3. Check processing of alternate futures
> 4. Check alt future when divergence is in last event in binlog file
> 5. Check alt future without binlogs
> I tried the test cases against current 10.0-base.
> (5) fails, this is a bug. It should fail also in non-strict mode. I will fix
> (as I said in comment to MDEV-4820).
> (3) Fails, but only because of a different error message (generic "event is
> not found in master binlog" rather than specific "slave has alternate future).

This is surprising. Test doesn't check the particular message text.
How does it fail?

> I can put in the "alternate future" error message, but I want to be sure you
> really understand what this does and the limitations.
> In your test, slave does one extra transaction, master does two. Slave ends up
> at 0-2-112, master ends up at 0-1-113. So because 113 >= 112, we can know that
> slave has an alternate future.
> But suppose you did the test the other way around, slave does two
> transactions, master does one. Then slave has 0-2-113 and master has
> 0-1-112. It is not the case that 112 >= 113. So we can not detect at this
> point that slave has an alternate future.
> So now we are going to give two *different* error messages to the user
> essentially at random, depending on which alternate future is furthest
> ahead. Is this really what you want?

Well, even if I wanted it differently there's no good way to do that.

> I would think it would be *better* for
> data center operations to have at least a consistent error message for the two
> situations.
> Or did I misunderstand something? Can the Google patch detect alternate slave
> futures in this case and distinguish it from master being behind, and if so,
> how?

No, neither my patch on MDEV-4820 nor Google's Group ID patch in MySQL
5.1 cannot detect alt future when slave has more transactions than
master. But that doesn't matter (for us) because in normal situation
master's GTID will continue moving forward while slave's GTID will
remain the same. So eventually (usually very quickly) we'll reach the
situation when seq_no on master is bigger and then slave will get "alt
future" error and it will stick, i.e. won't ever change again. That's
good enough for us because this small window of different error
message is pretty much impossible to catch -- you'll always look at
the logs when there's already "alt future" error in them.

> Other than the error message, (1)-(4) all pass for me on current unmodified
> 10.0-base. So I am left for MDEV-4820 with one bug to fix (5) and possibly one
> feature request for different error message. I cannot help thinking that there
> is something I'm missing from all you've written already on the subject of
> MDEV-4820, but I don't have anything concrete. So please let me know what I'm
> missing.

Apparently I failed to reproduce the problem scenarios in the testing
environment. Sorry, I didn't try to run it on unchanged code. But did
you try the manual reproduction steps I mentioned in MDEV-4820?

>> > With GTID,
>> > @@GLOBAL.gtid_binlog_pos is also stored in the last binlog file.
>> Right, but as I understood it gtid_binlog_pos is necessary only to
>> adjust gtid_current_pos if gtid_slave_pos is behind (btw, how do you
>> check that in non-strict mode when seq_no of the latest transaction
>> can be less than seq_no of old transaction?). If we know for sure that
>> gtid_slave_pos reflects the latest transaction then again
>> gtid_binlog_pos doesn't carry any new information and can be empty. Am
>> I missing something?
> Yes, I think so.
> For one, we need to know the next sequence number to use within each
> domain.

This information exists in gtid_slave_pos in situation I'm talking about.

> More subtle, we also need to know the last sequence number for every
> (domain_id, server_id) pair. This is the information that allows slave to
> start at the correct place in the master binlog even if sequence numbers are
> not monotonic between server_ids.

This is where I disagree. You keep insisting that this information is
necessary. By that you are basically saying: I need to support those
who set up lousy multi-master replication, thus I won't treat
domain_id as a single replication stream, I'll treat a pair
(domain_id, server_id) as a single replication stream (meaning there
could be several replication streams within one domain_id). So
virtually you merge server_id into domain_id and create one 64-bit
domain_id. And then each such merged domain_id has its own replication
position (determined by seq_no) and using that you determine where in
binlog slave should start to replicate. But again it seems that
MariaDB currently is working inconsistently even in this setup: slave
passes to server GTID only for one of server_ids. But binlog events
for this server_id don't have any particular order related to binlog
events from another server_id. So master can easily send events that
slave already has or skip some events that slave doesn't have. Even if
you put some protection on slave to not re-execute events that it
already has you still cannot protect against skipped events. So
replication in such situation probably will never break (i.e. slave
won't stop with some error) but results will be questionable. And the
only argument for this seem to be that anyone can be in the same
situation without GTID replication...

But what I'm asserting is that when replication is set up properly,
when different masters never write binlog events with the same
domain_id (and if a second server creates binlog event with the same
domain_id that's considered an error), when events with the same
domain_id can never come to a slave through different replication
streams, in such situation domain_id is one and only true domain id
that needs to remember last position, i.e. seq_no. In such setup
there's no need to remember all server_ids that this database had ever
had as masters (there could be hundreds of those). Then (domain_id,
seq_no) pair will uniquely identify server's position within domain
and no other information is necessary for that. server_id is then
needed only to distinguish alternate futures, that's it.

I thought gtid_strict_mode was supposed to be such mode of operation.

>> I wonder what kind of production environment tolerates lost
>> transactions or alternate futures.
>> It's really sad to hear that by intentional design MariaDB doesn't fit
>> well into those environments that don't want to tolerate db
>> inconsistencies...
> I have no idea where you heard that.
> Can you please be more concrete? Eg. give an example where the MariaDB design
> makes it impossible to avoid lost transactions, alternate futures, or other db
> inconsistencies?

Here is your words from an earlier email in this thread:

>>> I think there is a fundamental disconnect. In MariaDB GTID, I do not require
>>> or rely on monotonically increasing seqeunce numbers (monoticity is requred
>>> per-server-id, but not between different servers). Nor do I enforce or rely on
>>> absence of holes in the sequence numbers.

>>> This decision was a hard one to make and I spent considerable thought on this
>>> point quite early. It is true that this design reduces possibilities to detect
>>> some kinds of errors, like missing events and alternate futures.

>>> I can understand if this design is not optimal for what you are trying to
>>> do. However, implementing two different designs (eg. based on value of
>>> gtid_strict_mode) is not workable. I believe at least for the first version,
>>> the current design is what we have to work with.

If MariaDB cannot detect missing events and alternate futures that
means it silently allows them to exist. You said you made this
decision deliberately, so it's by design. And you said that MariaDB
shouldn't implement second design here.

So if we don't want to tolerate lost transactions and alternate
futures and want things to break whenever such events happen, stock
MariaDB cannot do it for us by design. And we have to do our custom
modifications to it to support such production environment.

Have I misunderstood what you said?

>> Just out of curiosity: could tell me what legitimate sequence of
>> events can lead to hole in sequence numbers?
> There are many ways. For example using one of the --replicate-ignore-*
> options. For example if you have a master with two schemas, you could have two
> slaves S1 and S2 each replicating one schema. You could even have an
> aggregating slave A on the third level that uses multi-master to replicate
> each schema from each second-level slave back to a single server. M->S1->A and
> M->S2->A.
> Of course, that third-level slave can not use GTID positioning unless you
> correctly configure different domain ids for the original first-level master
> transactions depending on schema used. _Exactly_ because in this case GTID can
> not ensure against lost transactions or other corruption. But thanks to the
> design, GTID can still be enabled and used in other parts of the replication
> hierarchy.

I would think actually third-level slave will break in such situation
because MariaDB doesn't allow GTIDs with the same (domain_id,
server_id) pair to be out of order, right? But with such replication
setup third-level slave can get a bunch of events from S1 first and
then it won't be able to get any events from S2 because they'll have
smaller seq_no than already present in the binlog.

>> So, in MySQL 5.1 with Google's Group IDs binlog didn't have real
>> information.
> Well, Google's Group ID must also need to store last GTID logged
> persistently, so that it can continue at the right point after a server
> restart/crash. Where is this stored? MariaDB GTID chooses to store this in the
> binlog, to avoid the overhead of extra InnoDB row operations for each
> transaction to store it in a table or system tablespace header. One way or the
> other, Google's Group ID must store this somewhere.

Sure. Last event in the binlog has last Group ID. If there's no
binlogs or there are only binlogs without events then variable
binlog_group_id has it.


Follow ups