maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #12593
Re: f1db957c5da: MDEV-21469: Implement crash-safe binary logging of the user XA
Hi, Andrei!
On Mar 16, Andrei Elkin wrote:
> > On Mar 14, Sujatha wrote:
> >> revision-id: f1db957c5da (mariadb-10.5.2-247-gf1db957c5da)
> >> parent(s): dc92235f21e
> >> author: Sujatha <sujatha.sivakumar@xxxxxxxxxxx>
> >> committer: Andrei Elkin <andrei.elkin@xxxxxxxxxxx>
> >> timestamp: 2020-12-21 16:10:46 +0200
> >> message:
> >>
> >> MDEV-21469: Implement crash-safe binary logging of the user XA
...
> >> diff --git a/sql/log.cc b/sql/log.cc
> >> index 731bb3e98f0..c69b8518cf4 100644
> >> --- a/sql/log.cc
> >> +++ b/sql/log.cc
> >> @@ -2019,28 +2027,49 @@ static int binlog_xa_recover_dummy(handlerton *hton __attribute__((unused)),
> >> return 0;
> >> }
> >>
> >> -
> >> +/*
> >> + The function invokes binlog_commit() and returns its result
> >> + when it has not yet called it already.
> >> + binlog_cache_mngr::completed_by_xid remembers the fact of
> >> + the 1st of maximum two subsequent calls.
> >> +*/
> >> static int binlog_commit_by_xid(handlerton *hton, XID *xid)
> >> {
> >> + int rc= 0;
> >> THD *thd= current_thd;
> >> -
> >> - (void) thd->binlog_setup_trx_data();
> >> + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data();
> >>
> >> DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT);
> >>
> >> - return binlog_commit(hton, thd, TRUE);
> >> -}
> >> + if (!cache_mngr->completed_by_xid)
> >
> > On what code path can you have completed_by_xid == false here?
>
> Let me answer broadly, the purpose of `completed_by_xid' is to
> make sure binlog_hton->commit_by_xid() does not invoke lower level
> binlog_commit() for the 2nd time at the following function.
I understand that. And I mean that if commit_by_xid is always true, then
binlog_hton->commit_by_xid can be a noop, because you always invoke
binlog_commit_by_xid (or binlog_rollback_by_xid) explicitly.
> I leave only two essential statements prepared with explanatory comments:
>
> ha_commit_or_rollback_by_xid()
> {
> // binlog_commit() has to be called as first hton ...
>
> binlog_hton->commit_by_xid(binlog_hton, xid);
>
> // ... So the above raises the flag (sets up a fence) to block binlog_hton
> // run for the 2nd time in the loop here:
>
> plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton,
> MYSQL_STORAGE_ENGINE_PLUGIN, &xaop);
>
> Now I am thinking a (much) better option exists to block that 2nd time invocation
> of binlog_commit() with
> merely testing the type of hton in
>
> static my_bool xacommit_handlerton(THD *unused1, plugin_ref plugin, void *arg)
> {
> handlerton *hton= plugin_hton(plugin);
> - if (hton->recover)
> + if (hton->recover && hton != binlog_hton)
> {
> hton->commit_by_xid();
>
> and the same to xarollback_handlerton.
> I am now committed to turn to it.
No, see above. Ideally, binlog won't have a hton and it won't be in the
list of engines at all. But if that's a too big step to make now, then
just make binlog_hton->commit_by_xid noop (or NULL, if that works)
and invoke binlog_commit_by_xid() explicitly by name.
> >> + {
> >> + rc= binlog_commit(hton, thd, TRUE);
> >> + cache_mngr->completed_by_xid= true;
> >> + }
> >>
> >> + return rc;
> >> +}
> >>
> >> @@ -10333,14 +10386,108 @@ start_binlog_background_thread()
...
> >> +
> >> +/**
> >> + Performs recovery based on transaction coordinator log for 2pc. At the
> >> + time of crash, if the binary log was in active state, then recovery for
> >> + "implicit" 'xid's and explicit 'XA' transactions is initiated,
> >> + otherwise merely the gtid binlog state is updated.
> >> + For 'xid' and 'XA' based recovery the following steps are performed.
> >> +
> >> + Identify the active binlog checkpoint file.
> >> + Scan the binary log from the beginning.
> >> + From GTID_LIST and GTID_EVENTs reconstruct the gtid binlog state.
> >> + Prepare a list of 'xid's for recovery.
> >> + Prepare a list of explicit 'XA' transactions for recovery.
> >> + Recover the 'xid' transactions.
> >> + The explicit 'XA' transaction recovery is initiated once all the server
> >> + components are initialized. Please check 'execute_xa_for_recovery()'.
> >
> > why explicit XA recovery is delayed?
>
> It has to wait for at least until after
> init_update_queries();
> in init_server_components() without which
Better move init_update_queries() to happen before recovery.
As far as I can see, init_update_queries() only sets flags in the array
of commands, it does not depend on anything and can be done anytime.
Ideally all the recovery could happen in one place, not split over two
separate functions that are invoked at separate points in time during
the startup.
> >> @@ -10561,6 +10734,219 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
> >> + {
> >> + enable_apply_event= true;
> >> + // partially engine-prepared XA is first cleaned out prior replay
> >> + thd->lex->sql_command= SQLCOM_XA_ROLLBACK;
> >> + ha_commit_or_rollback_by_xid(&gev->xid, 0);
> >> + }
> >> + else
> >> + --recover_xa_count;
> >> + }
> >> + }
> >> + else if (gev->flags2 & Gtid_log_event::FL_COMPLETED_XA)
> >> + {
> >> + if (member->state == XA_COMPLETE &&
> >> + member->in_engine_prepare > 0)
> >> + enable_apply_event= true;
> >
> > why? you cannot replay a fully prepared and partially committed
> > transaction
>
> We might make use of slave_exec_mode = IDEMPOTENT but only ROW format
> events are subject to the mode.
Yes, so why you're trying to replay it instead of just completing the
commit?
Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx
References