← Back to team overview

maria-developers team mailing list archive

Re: Questions re MDEV-4736 and MDEV-4739 (was Re: Spider's installation sql file)

 

Hi Sergei,

yes, there are many reasons why cohort may fail during commit phase. Spider
has a lot reasons too. In this particular case (test case provided by Elena)
it fails with the following error:
ERROR 42S02: Table 'mysql.spider_xa' doesn't exist

Anyway it is not clear how to handle cohort commit failure properly. Let's say
we have 4 cohorts participating in XA transaction. Cohort 2 and 3 fail.

Cohort 1 can't rollback (because it committed).
What should we do with cohort 4 (commit/rollback/nothing)?
Should we remove this transaction from xid_cache?
Should we indicate clearly which cohorts failed?
Should it be error or a warning?
Should we hold the whole system (all cohorts + manager) until failure is
resolved?

Thanks,
Sergey

On Fri, Oct 04, 2013 at 06:02:51PM +0200, Sergei Golubchik wrote:
> Hi, Sergey!
> 
> On Oct 04, Sergey Vojtovich wrote:
> > Hi Kentoku,
> > 
> > I just reviewed one of your revisions, specifically
> > bzr diff -c3829 lp:~kentokushiba/maria/10.0.4-spider-3.0/
> > 
> > I believe things are a bit more complex: 2PC protocol doesn't seem to permit
> > cohorts to fail during commit phase:
> > http://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase
> > 
> > <quot>
> > If the coordinator received an agreement message from all cohorts during the
> > commit-request phase:
> >   1. The coordinator sends a commit message to all the cohorts.
> >   2. Each cohort completes the operation, and releases all the locks and
> >      resources held during the transaction.
> >   3. Each cohort sends an acknowledgment to the coordinator.
> >   4. The coordinator completes the transaction when all acknowledgments have
> >      been received.
> > </quot>
> > 
> > I read the above as: the only problem coordinator may experience is missing
> > acknowledgement. What shall coordinator do if some cohorts acknowledged
> > commit, but some did not? Probably spider should detect it earlier?
> > 
> > Sergei, what's your opinion?
> 
> Let me see, if I understood the problem correctly.
> The crash happens because spider uses my_error() in the 2pc commit step,
> and the error status is lost up the stack, so Diagnostic_area::ok()
> fires an asserts on redefining the statement status. Is that right?
> 
> The server should know that the error has happened on commit and should
> not trigger an assert, it should report the error to the user.
> The error at the commit step should normally never happen, it means
> inconsistent data, because some participants might've already committed
> the transaction and they cannot roll it back anymore. Still, the commit
> method *might* return an error status and we shouldn't ignore it.
> Hardware failures are a good example of what can cause a commit error.
> 
> Anyway, Spider should be fixed to not error out in 2pc commits, because
> such a commit means inconsistent data, it's a bad error, it breaks ACID.
> An engine is expected to check all preconditions during prepare, and if
> prepare succeeds, it is basically a guarantee that the commit will
> succeed, it is not allowed to fail anymore.
> 
> Regards,
> Sergei
> 


Follow ups

References