maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12572
Re: ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
Hi, Andrei!
On Mar 01, Andrei Elkin wrote:
> > I've reviewed almost everything, see comments below. But not the
> > Recovery_context methods. Please explain how it works and how all
> > these truncate_validated, truncate_reset_done, truncate_set_in_1st,
> > etc all work together.
>
> ...specifically to this point. Just in case I hope you did not miss to
> read recovery_design.txt from the MDEV, which does not go into coding
> details that you're effectively about in above.
I've read it now. Still I don't understand why you need an extra round
of binlog scanning (two rounds if only one file, three rounds if many).
Also, this recovery_design.txt is not part of the code, so whoever will
look at it later will be just as puzzled as I was.
...
> [ G1, G2, ..., G_k, g_{k+1}, ... g_n ]
>
> here the uppercase `G' stands for committed trx, the smallcase `g' for
> prepared,`_k' - sub-scripts in the recovery sequence. As the capital
> letter first rules in the single-engine case the first occurrence of a
> pattern `G_k,g_k+1' identifies the truncate index. The patch reflects
> such fact with raising `truncate_validated' flag.
>
> But it's more complicated in the multiple binlog files / engines case.
> The first `Gg' letter-case drop is not guaranteed to be the only drop
> so `truncate_reset_done' and `truncate_set_in_1st' are introduced to
> help with truncate index identification.
Why is it not guaranteed?
> When `truncate_validated' is set that indicates the truncate index is
> determined and may not change in the current (1st of 2nd) nor future
> rounds. `truncate_reset_done' says that an "inverse" `g_k,G_k+1' pair
> is found so that any earlier truncation candidate gets reset (to
> "zero"). If there will be later any candidate found in *this* (1st or
> 2nd) round in the sequence its index will be obviously greater.
>
> `truncate_set_in_1st' function is to remember that the truncate
> candidate was found in the 1st round (in the "hot" binlog file), but
> if the candidate has not been validated `!truncate_validated' it may
> be exacted in the 2nd round and then to an earlier transaction. So the
> flag helps to handle exception from truncate candidate monotony rule:
> e.g the hot binlog B2 contains `[g5,g6,...g_n]' and a ref to binlog
> checkpoint file B1 that contains `[G1,g2,G3,g4]'. The first round
> truncate candidate of g5 would be first exacted to `g2' before finally
> ascertained to `g4' in the 2nd round. (Notice `g2 -> g4' preserves the
> truncate index monotony).
>
> Notice that due to exacting like `g2 -> g4' in the 2nd round of the
> above example `g2' got "to be up-cased" into `G2' for committing
> (feasible with two trx:s on two different engine scenario - g2 with
> Innodb only got prepared, G4 - on Rocksdb, and got committed prior the
> crash). That's what the 3rd round is for.
>
> I hope this will be helpful.
Unfortunately, not very much. It describes how you juggle with variables
and scanning rounds. But not *what* you're trying to find. And it's kind
of difficult to reverse engineer your "how" back into "what" :(
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx
References