← Back to team overview

maria-developers team mailing list archive

Re: Questions re MDEV-4736 and MDEV-4739 (was Re: Spider's installation sql file)

 

Hi Sergey,

>> It is better to be able to commit through Spider node. Currently it is
>> impossible, but I think it is possible if xid_cache_delete is skipped
when
>> xa commit get an error from a storage engine.
>> Could you please tell me your opinion?

> I don't understand how you can rely on in-memory xid_cache_delete. It's
> not persistent, if the Spider node is restarted, it will be lost anyway.

When the Spider node is restarted, Spider can register xid into xid_cache,
because hton->recover is called at starting server, and registered xid can
do xa commit. But this xa commit failed case is deleted xid from xid_cache.
What kind of problem is there, if xid_cache_delete is skipped when xa
commit get an error from a storage engine?

> I think Spider can, probably, perform an xa recovery of the data node
> automatically - when a node is reconnected after a crash, Spider node
> looks in the mysql.spider_xa table and commits/aborts transactions on
> the node accordingly. But it's a bit tricky, if you consider that the
> Spider node itself can crash. One needs to analyze carefully all cases
> where the data and the Spider node crash at any point during the
> commit sequence. I have not done that.

If crashed Spider node can recovery, it's no problem for xa recovery. If
crashed Spider node can't recovery (gone away for ever), it needs to get
used xid from application log or something for recovering. Automatic xa
recovery feature is planed in the future. Thank you for suggesting it to me!

> With the "error during the commit", I checked what MariaDB does, it's
> actually better than I thought. After successful prepare it won't rollback
> the transaction in any engine. And with your node crash the transaction
> was, from user point of view, committed - it was neither rolled back,
> nor corrupted or partially applied. It was "virtually committed" and
> will be fully committed and available after the node recovery.
> So, it looks like it's ok to return an error in this specific case.

Thank you for reviewing!

Thanks,
Kentoku



2013/10/5 Sergei Golubchik <serg@xxxxxxxxxxx>

> Hi, kentoku!
>
> On Oct 05, kentoku wrote:
> > Hi Sergei,
> >
> > > Just one question, before I could answer.
> > > What does it mean "data node is committed manually after recovery"?
> > > What exactly should the user do?
> >
> > Thank you for caring it!
> > The xa commit sequence with crash recovery is like the followings.(In
> this
> > case. I talk about 1 Spider node and 3 data nodes). Sorry for long
> > explanation, answer for "What does it mean "data node is committed
> manually
> > after recovery"?" is 3.
> >
> > 1. An application send xa prepare to Spider node.
> > appilication -> xa prepare -> Spider node -|-> xa prepare -> data node1
> >                                            |-> xa prepare -> data node2
> >                                            |-> xa prepare -> data node3
> > return success to an application.
> >
> > 2. An application send xa commit to Spider node after crushing data
> node2.
> > appilication -> xa commit -> Spider node -|-> xa commit -> data node1
> >                                           |-> xa commit xx data node2
> >                                           |-> xa commit -> data node3
> > return error to an application.
> >
> > 3. Send xa recover and xa commit manually to data node2 after recovering.
> >     Status of xa transaction is recorded in mysql.spider_xa table. So you
> > can know about you should commit or rollback the xa transaction from this
> > table.
> >     It's human or monitoring tool operation.
> >                                            -> xa commit -> data node2
> >
> > It is better to be able to commit through Spider node. Currently it is
> > impossible, but I think it is possible if xid_cache_delete is skipped
> when
> > xa commit get an error from a storage engine.
> > Could you please tell me your opinion?
>
> I don't understand how you can rely on in-memory xid_cache_delete. It's
> not persistent, if the Spider node is restarted, it will be lost anyway.
>
> I think Spider can, probably, perform an xa recovery of the data node
> automatically - when a node is reconnected after a crash, Spider node
> looks in the mysql.spider_xa table and commits/aborts transactions on
> the node accordingly. But it's a bit tricky, if you consider that the
> Spider node itself can crash. One needs to analyze carefully all cases
> where the data and the Spider node crash at any point during the
> commit sequence. I have not done that.
>
> With the "error during the commit", I checked what MariaDB does, it's
> actually better than I thought. After successful prepare it won't rollback
> the transaction in any engine. And with your node crash the transaction
> was, from user point of view, committed - it was neither rolled back,
> nor corrupted or partially applied. It was "virtually committed" and
> will be fully committed and available after the node recovery.
> So, it looks like it's ok to return an error in this specific case.
>
> Regards,
> Sergei
>
>

References