maria-developers team mailing list archive

Thread
Date
Re: Review of patch for MDEV-4820

To: Pavel Ivanov <pivanof@xxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Tue, 13 Aug 2013 12:26:25 +0200
Cc: maria-developers <maria-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CAAG=WUu7OY60PuorSZcS4BYGmK3xijE5G3a972uRLw32iHcFpA@mail.gmail.com> (Pavel Ivanov's message of "Mon, 12 Aug 2013 11:09:58 -0700")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
Pavel Ivanov <pivanof@xxxxxxxxxx> writes:

> Note that the patch I've attached have test case that should reproduce the problems.

Thanks, I've now gone through the testcases also. Let me number the individual
tests as follows:

1. Check that gap in seq_no without binlogs doesn't allow to replicate
2. The same test but with starting master GTID from different server_id
3. Check processing of alternate futures
4. Check alt future when divergence is in last event in binlog file
5. Check alt future without binlogs

I tried the test cases against current 10.0-base.

(5) fails, this is a bug. It should fail also in non-strict mode. I will fix
(as I said in comment to MDEV-4820).


(3) Fails, but only because of a different error message (generic "event is
not found in master binlog" rather than specific "slave has alternate future).

I can put in the "alternate future" error message, but I want to be sure you
really understand what this does and the limitations.

In your test, slave does one extra transaction, master does two. Slave ends up
at 0-2-112, master ends up at 0-1-113. So because 113 >= 112, we can know that
slave has an alternate future.

But suppose you did the test the other way around, slave does two
transactions, master does one. Then slave has 0-2-113 and master has
0-1-112. It is not the case that 112 >= 113. So we can not detect at this
point that slave has an alternate future.

So now we are going to give two *different* error messages to the user
essentially at random, depending on which alternate future is furthest
ahead. Is this really what you want? I would think it would be *better* for
data center operations to have at least a consistent error message for the two
situations.

Or did I misunderstand something? Can the Google patch detect alternate slave
futures in this case and distinguish it from master being behind, and if so,
how?


Other than the error message, (1)-(4) all pass for me on current unmodified
10.0-base. So I am left for MDEV-4820 with one bug to fix (5) and possibly one
feature request for different error message. I cannot help thinking that there
is something I'm missing from all you've written already on the subject of
MDEV-4820, but I don't have anything concrete. So please let me know what I'm
missing.

Or is it just that my explanations are confusing, and it would have been
better if I'd just fixed (5) and then discussed (3) before answering? (But the
discussion is very useful for me to get my thoughts clear, the details around
this are unfortunately quite complex).

> > With GTID,
> > @@GLOBAL.gtid_binlog_pos is also stored in the last binlog file.
> 
> Right, but as I understood it gtid_binlog_pos is necessary only to
> adjust gtid_current_pos if gtid_slave_pos is behind (btw, how do you
> check that in non-strict mode when seq_no of the latest transaction
> can be less than seq_no of old transaction?). If we know for sure that
> gtid_slave_pos reflects the latest transaction then again
> gtid_binlog_pos doesn't carry any new information and can be empty. Am
> I missing something?

Yes, I think so.

For one, we need to know the next sequence number to use within each
domain.

More subtle, we also need to know the last sequence number for every
(domain_id, server_id) pair. This is the information that allows slave to
start at the correct place in the master binlog even if sequence numbers are
not monotonic between server_ids.

Setting gtid_slave_pos does not restore any of this information. It just
triggers a special case that allows to turn a --log-slave-updates=0 slave into
a master. The fact that it partially works for your case of removing binlogs
is mostly accidental, and it is not the correct way to handle it.

You really need to understand this subtle part to understand the finer details
of slave connect.

It is true that _if_ we required strict mode always we could use a simpler
algorithm. But we do not require that. And since we do not, and the more
complex algorithm works in all cases, it is better to have just one algorithm
that handles all cases, rather than two separate algorithms, with twice the
potential for bugs.

So if you really need to remove manually binlogs on a master, as I said before
the correct way is to preserve the full information. Such way is not currently
implemented. I have suggested two possible ways it could be implemented
(always read master-bin.info in non-crash case, and explicit CHANGE MASTER TO
gtid_list=XXX). So far I have not made it a priority to support manual
deletion of binlogs on the master.

I hope this makes things clearer, else please help me understand what it is
that I am failing to explain properly.

> I wonder what kind of production environment tolerates lost
> transactions or alternate futures.
> It's really sad to hear that by intentional design MariaDB doesn't fit
> well into those environments that don't want to tolerate db
> inconsistencies...

I have no idea where you heard that.

Can you please be more concrete? Eg. give an example where the MariaDB design
makes it impossible to avoid lost transactions, alternate futures, or other db
inconsistencies?

> Just out of curiosity: could tell me what legitimate sequence of
> events can lead to hole in sequence numbers?

There are many ways. For example using one of the --replicate-ignore-*
options. For example if you have a master with two schemas, you could have two
slaves S1 and S2 each replicating one schema. You could even have an
aggregating slave A on the third level that uses multi-master to replicate
each schema from each second-level slave back to a single server. M->S1->A and
M->S2->A.

Of course, that third-level slave can not use GTID positioning unless you
correctly configure different domain ids for the original first-level master
transactions depending on schema used. _Exactly_ because in this case GTID can
not ensure against lost transactions or other corruption. But thanks to the
design, GTID can still be enabled and used in other parts of the replication
hierarchy.

> Don't forget the special case when the GTID requested by slave is the
> last event in this domain in the previous binlog file. Then you don't
> look into that file and start serving directly from the next event
> which won't be equal to what slave requested.

Agree.

> Well, if I understood you correctly all test cases shouldn't work by
> design. Maybe only except the second case when server doesn't
> replicate at all.

On the contrary, from what I could determine all the test cases should work,
and I will fix the one bug where they do not (and the error message if you
really insist).

> So, in MySQL 5.1 with Google's Group IDs binlog didn't have real
> information.

Well, Google's Group ID must also need to store last GTID logged
persistently, so that it can continue at the right point after a server
restart/crash. Where is this stored? MariaDB GTID chooses to store this in the
binlog, to avoid the overhead of extra InnoDB row operations for each
transaction to store it in a table or system tablespace header. One way or the
other, Google's Group ID must store this somewhere.

AFAIK, Google's Group ID is not on by default, and needs to be explicitly
enabled. Enabling it adds overhead to every binlog event. In contrast, MariaDB
GTID is on by default, and the implementation actually _decreases_ the size of
binlog events compared to 5.5. Making this work requires the extra information
in Gtid_list.

 - Kristian.
Follow ups

Re: Review of patch for MDEV-4820
From: Pavel Ivanov, 2013-08-13
References

Review of patch for MDEV-4820
From: Kristian Nielsen, 2013-08-12
Re: Review of patch for MDEV-4820
From: Pavel Ivanov, 2013-08-12