maria-developers team mailing list archive

Thread
Date

Re: ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync

To: Andrei Elkin <andrei.elkin@xxxxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxx>
Date: Mon, 15 Mar 2021 13:25:45 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87a6rmzvyd.fsf@quad>

Hi, Andrei!

On Mar 01, Andrei Elkin wrote:
> > I've reviewed almost everything, see comments below. But not the
> > Recovery_context methods. Please explain how it works and how all
> > these truncate_validated, truncate_reset_done, truncate_set_in_1st,
> > etc all work together.
> 
> ...specifically to this point. Just in case I hope you did not miss to
> read recovery_design.txt from the MDEV, which does not go into coding
> details that you're effectively about in above.

I've read it now. Still I don't understand why you need an extra round
of binlog scanning (two rounds if only one file, three rounds if many).

Also, this recovery_design.txt is not part of the code, so whoever will
look at it later will be just as puzzled as I was.

...
>   [ G1, G2, ..., G_k, g_{k+1}, ... g_n ]
> 
> here the uppercase `G' stands for committed trx, the smallcase `g' for
> prepared,`_k' - sub-scripts in the recovery sequence. As the capital
> letter first rules in the single-engine case the first occurrence of a
> pattern `G_k,g_k+1' identifies the truncate index. The patch reflects
> such fact with raising `truncate_validated' flag.
> 
> But it's more complicated in the multiple binlog files / engines case.
> The first `Gg' letter-case drop is not guaranteed to be the only drop
> so `truncate_reset_done' and `truncate_set_in_1st' are introduced to
> help with truncate index identification.

Why is it not guaranteed?

> When `truncate_validated' is set that indicates the truncate index is
> determined and may not change in the current (1st of 2nd) nor future
> rounds. `truncate_reset_done' says that an "inverse" `g_k,G_k+1' pair
> is found so that any earlier truncation candidate gets reset (to
> "zero"). If there will be later any candidate found in *this* (1st or
> 2nd) round in the sequence its index will be obviously greater.
> 
> `truncate_set_in_1st' function is to remember that the truncate
> candidate was found in the 1st round (in the "hot" binlog file), but
> if the candidate has not been validated `!truncate_validated' it may
> be exacted in the 2nd round and then to an earlier transaction. So the
> flag helps to handle exception from truncate candidate monotony rule:
> e.g the hot binlog B2 contains `[g5,g6,...g_n]' and a ref to binlog
> checkpoint file B1 that contains `[G1,g2,G3,g4]'. The first round
> truncate candidate of g5 would be first exacted to `g2' before finally
> ascertained to `g4' in the 2nd round. (Notice `g2 -> g4' preserves the
> truncate index monotony).
> 
> Notice that due to exacting like `g2 -> g4' in the 2nd round of the
> above example `g2' got "to be up-cased" into `G2' for committing
> (feasible with two trx:s on two different engine scenario - g2 with
> Innodb only got prepared, G4 - on Rocksdb, and got committed prior the
> crash). That's what the 3rd round is for.
> 
> I hope this will be helpful.

Unfortunately, not very much. It describes how you juggle with variables
and scanning rounds. But not *what* you're trying to find. And it's kind
of difficult to reverse engineer your "how" back into "what" :(

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx

References

Re: ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync
From: Sergei Golubchik, 2021-02-26