← Back to team overview

maria-developers team mailing list archive

Re: ee538938345: MDEV-21117: refine the server binlog-based recovery for semisync

 

Hi, Andrei!

On Mar 22, Andrei Elkin wrote:
> 
> Here it goes.
> 
> As after master crash and failover to slave, the demoted-to-slave
> ex-master must be ready to face and accept its own (generated by)
> events, without generally necessary --replicate-same-server-id.
> 
> So the acceptance conditions are refined/relaxed for the semisync slave
> connected to master in the GTID mode which ensures (under gtid_strict_mode ON)
> that there can't be any duplicate events for execution
> on such an ex-master slave.
> Non-GTID "binlog-truncated" ex-master semisync slave has to be
> configured with --replicate-same-server-id.

Why would you generally care about same server id in the gtid mode?
If gtid is not present locally, then it should be applied, right? Why to
look at server id at all? It's a thing of the past, from pre-gtid times.

> >> >> +--let $query1 = INSERT INTO t VALUES (20)
> >> >> +--let $query2 = DELETE FROM t2 WHERE f = 666 /* no such record */
> >> >> +--source binlog_truncate_active_log.inc
> >> >> +
> >> >> +--echo # Case B.
> >> >> +# The inverted sequence ends up to truncate only $query2
> >> >> +--let $this_search_pattern = Successfully truncated.*to remove transactions starting from GTID 0-1-10
> >> >> +--let $query1 = DELETE FROM t2 WHERE f = 0
> >> >
> >> > why not `= 666 /* no such record */` ?
> >> > is `= 0` important here?
> >> 
> >> Yes, there's a fine detail here.
> >> 
> >> 'The inverted sequence ends up' to leave in binlog
> >> 
> >>   $query1 = DELETE FROM t2 WHERE f = 0
> >> 
> >> which is as ineffective as 666. The patch recognizes the ineffectiveness
> >> through a special value of engines involved counter in Gtid event.
> >> In principle the truncation could
> >> start from it rather than the following $query2.
> >> But that would mean more complicated coding.
> >> So the semantics of the zero is to be good, as we're so used to :-)!
> >
> > Sorry, I didn't understand anything. What does the value in the WHERE
> > clause has to do with engines and where the binlog is truncated?

ok. after discussing the issue on slack, the answer appears to be "the
value in WHERE has nothing to do with any of that, it doesn't matter
whether it's 0 or 666 as long as it doesn't match any rows"

then, I'd say, better change 0 to 667 in the test. We wouldn't be having
this discussion at all if you'd have some number that appeared arbitrary
(and "667" is fine). My confusion was exactly caused by the fact that it
looked like =0 matters, that it was important to hav 0 there.
 
> >> >> +--let $query2 = INSERT INTO t VALUES (20)
> >> >> +--source binlog_truncate_active_log.inc
> > ...
> >> >> diff --git a/sql/sql_class.h b/sql/sql_class.h
> >> >> index 140394fefc1..87fa88d4c89 100644
> >> >> --- a/sql/sql_class.h
> >> >> +++ b/sql/sql_class.h
> >> >> @@ -4632,7 +4632,7 @@ class THD :public Statement,
> >> >>    LF_PINS *tdc_hash_pins;
> >> >>    LF_PINS *xid_hash_pins;
> >> >>    bool fix_xid_hash_pins();
> >> >> -
> >> >> +  bool is_1pc_ro_trans;
> >> >
> >> > I don't like how it looks, it's in THD (for every single connection
> >> > and more) and it's only used in some exotic cases. It's set in two
> >> > places around ha_commit_one_phase(), when it would be better to set
> >> > it inside ha_commit_one_phase(), in one place only.
> >> >
> >> > But I cannot suggest how to get rid of it yet as I don't understand
> >> > what it's for, I've asked a question about it below, in
> >> > Gtid_log_event::Gtid_log_event
> >> 
> >> The flag is to help sorting out ineffective (no engine data change)
> >> STATEMENT binlog-format logged transaction. I thought it's practical
> >> enough to face and important to handle.
> >> Through an example,
> >> let the table `t' be initially empty, and
> >> let binlog contain
> >> 
> >> trx1: INSERT INTO t SET   a = 1;   /* slave acked, trx got committed in engine */
> >> trx2: INSERT INTO t SET   a = 2;   /* not acked, trx remains prepared in engine */
> >> trx3: DELETE FROM t WHERE a = 0;   /* ineffective, does not have Xid event */
> >> 
> >> prior a crash with their states as commented.
> >> The truncate trx should be trx2 as the trx3's status
> >> suggests it's harmless for involving into
> >> truncation.
> >> However we have yet to distinguish it from unsafe (to truncate)
> >> event group ("transaction") that also ends with COMMIT query not Xid event.
> >
> > Hmm. I see. But thd isn't a universal dumpster for passing
> > arguments to the function, don't use it like that.
> 
> Well, I had to resort to that. To explain,
> 
> > Please, pass this "read-only transaction with no xid" flag to the
> > Gtid_log_event constructor as an argument. Or recalculate it inside
> > the constructor, it's not such an expensive thing to do.
> 
> To pass a flag ensues adding an extra arg to
>   struct handlerton::commit
> which I don't feel easy about considering the reason.

yes, one more reason to remove binlog_hton. But for now, like with
binlog_commit_by_xid, I'd say, call binlog_commit directly and make
binlog_hton->commit a no-op. Like:

--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -1935,6 +1935,7 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS *trans,
 
   if (ha_info)
   {
+    binlog_commit(ht, thd, all);
     for (; ha_info; ha_info= ha_info_next)
     {
       int err;

> >> >> +        if (!truncate_validated)
> >> >> +        {
> >> >> +          DBUG_ASSERT(round <= 2);
> >> >> +          /*
> >> >> +            Either truncate was not set or was reset, else
> >> >> +            it gets incremented, otherwise it may only set to an earlier
> >> >> +            offset only at the turn out of the 1st round.
> >> >
> >> > sorry, I cannot parse that :(
> >> 
> >> The comments are for the assert in a branch that acts when the truncate position
> >> (gtid) is not yet finalized.
> >> Let me annotate under each assert's line.
> >> 
> >> >
> >> >> +          */
> >> >> +          DBUG_ASSERT(truncate_gtid.seq_no == 0 ||
> >> 
> >> 'truncate was not set or was reset' ^, that is truncate_gtid.
> >> The 'was reset' is actually not present in the assert - that'd be
> >> reflected with
> >>   truncate_reset_done == true
> >>  
> >> 
> >> >> +                      last_gtid_coord >= binlog_truncate_coord ||
> >> 
> >> 'it gets incremented' ^
> >
> > why does that mean "it gets incremented" ?
> 
> I think my English failed me.
> 
>    It is getting incremented

I understand that "it gets incremented" means "it = it + 1", no problem.
How this assert verifies that "it gets incremented" ?

> meant to say (in my last reply as well).
> The use case is that at this point there's a truncate candidate /* 1 */
> and it has not been settled yet /* 2 */, 
> 
>    !truncate_validated == true /* 2 */ && truncate_gtid.seq_no > 0 /* 1 */
> 
> and the current gtid is still "better" to take over as the candidate in
> this block. The candidate's offset must not decrease (unless...
> 
> >> >> +                      (round == 2 && truncate_set_in_1st));
> >> 
> >> And if it's neither of the above there must've been truncate candidate
> >> set in the 1st round.
> >  
> > ...
> ... the above ).
> 
> >> >>        switch (typ)
> >> >>        {
> >> >> +      case FORMAT_DESCRIPTION_EVENT:
> >> >> +        if (round > 1)
> >> >> +        {
> >> >> +          if (fdle != fdle_arg)
> >> >> +            delete fdle;
> >> >> +          fdle= (Format_description_log_event*) ev;
> >> >
> >> > why is it needed? all binlog files should have the same
> >> > Format_description_log_event anyway.
> >> 
> >> Actually contrary to what was stated (the patch removed that claim) in
> >> the parent revision of this commit, the FD:s of the
> >> recovery binlog files may have different checksums. I.e
> >> 
> >>    set @@global.binlog_checksum = value
> >> 
> >> can be issued to rotate binlog to a new hot file having different checksum value
> >> while the binlog *CheckPoint* file is behind. So this is possible to
> >> have at recovery:
> >> 
> >>   B_1(CP, checksum = NONE), B_2(hot, checksum = CRC32)
> >> 
> >> which also manifests a normal use case bug I have not reported any bug (though wanted to).
> >
> > okay, checksum can differ. But how does it affect
> > Format_description_log_event ?
> 
> FD holds knowledge of whether events in the file are with the checksum
> info. In my example above B_1 events would be (are) considered as checksummed
> by the base.

Okay. Please,
1. add a comment about it
2. add a test case for it
ideally, also, move it into a separate commit

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


References