maria-developers team mailing list archive

Thread
Date

Re: Architecture review of MWL#132 Transaction coordinator plugin

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Sergei Golubchik <serg@xxxxxxxxxxxx>
Date: Sun, 5 Sep 2010 15:40:06 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87tyn0qlvu.fsf@knielsen-hq.org>
Resent-date: Sun, 5 Sep 2010 15:40:53 +0200
Resent-from: sergii@xxxxxxxxx
Resent-message-id: <20100905134053.GA31593@janus.mylan>
Resent-to: maria-developers@xxxxxxxxxxxxxxxxxxx
User-agent: Mutt/1.5.16 (2007-06-09)

Hi, Kristian!

Now, WL#132 - Transaction coordinator plugin

> ============= High-Level Specification
...
> In current MariaDB, we have two different TC implementations (as well
> as a "dummy" empty implementation that I do not know if is used).

The code in mysqld.cc is

  tc_log= (total_ha_2pc > 1 ? (opt_bin_log  ?
                               (TC_LOG *) &mysql_bin_log :
                               (TC_LOG *) &tc_log_mmap) :
           (TC_LOG *) &tc_log_dummy);

so, tc_log_dummy is used when there's at most one xa-capable engine.
But MySQL does not use 2pc for a transaction unless it has at least two
xa-capable participants. In other words, tc_log_dummy is never used.
 
> Binary log
> ----------
> 
> The binary log implements also a "fake" storage engine, mainly to hook
> into the commit (and prepare) phase of transaction processing. This is
> mainly used for statements in non-transactional engines, which are
> "committed" and written to the binary log outside of the TC and
> log_xid() framework.

No, this is used to make the number of xa-capable transaction
participants more than one and to force MySQL to use 2PC.
 
> TC interface subclasses
> -----------------------
> 
> The MWL#116 has two different algorithms for handling commit order and
> invoking prepare_ordered() and commit_ordered() handler methods:
> 
>  - One used with TC_MMAP, which needs no correspondance between
>  engines and TC. This uses the existing log_xid() interface.
> 
>  - One used with the binary log TC, which ensures same commit order in
>  engines and binary log, and which uses a new single-threaded
>  group_log_xid() TC interface to efficiently do group commit.
> 
> In the prototype patch for MWL#116, these two methods are mixed with
> each other in the function ha_commit_trans(), and the logic is quite
> complex. Using the log_and_order() TC generalisation provides a nice
> cleanup of this.
> 
> We implement two subclasses of the TC interface:
> 
>  - One class TC_LOG_unordered for the method used with TC_MMAP. This
>  implements the old log_xid() interface.
> 
>  - One class TC_LOG_group_commit for the method used for the binary
>  log. This implements the new group_log_xid() interface.
> 
> Each subclass implements the corresponding algorithm for invoking
> prepare_ordered() and commit_ordered(), using the same mechanisms as
> in MWL#116, but implemented in a cleaner way. The ha_commit_trans()
> function then has no details about prepare_ordered() or
> commit_ordered(), it just calls into tc_log->log_and_order(), which
> handles the necessary details.
> 
> Thus a simple TC plugin similar to the binary log or TC_MMAP can
> implement one of the simple interfaces log_xid() or group_log_xid(),
> without having to worry about prepare_ordered() and commit_ordered().
> But a plugin like Galera that needs to do more can implement the more
> general interface.

I still see no real value in keeping or supporting log_xid() interface.

I think we can only implement one interface - group_log_xid() - and
that's enough.
 
> ============= Low-Level Design
...
> log_and_order()
>     Requests a decision to commit (non-zero return) or rollback (zero
>     return) of the transaction. At this point, the transaction has
>     been successfully prepared in all engines.
> 
>     The method must call run_prepare_ordered(), in a way so that calls
>     in different threads happen in the order that the transactions are
>     committed. This call must be protected by the global
>     LOCK_prepare_ordered mutex.
> 
>     The method must then call run_commit_ordered(), protected by
>     LOCK_commit_ordered, again so that different threads are called in
>     the order that transactions are committed.
> 
>     The idea with prepare_ordered() is to call it as early as possible
>     after commit order has been decided, for example to release locks
>     early. In particular, a transaction can still be rolled back after
>     prepare_ordered() (for example in case of a crash). In contrast,
>     commit_ordered() may only be called after the transaction is
>     durably committed in the TC.
> 
>     If need_prepare_ordered or need_commit_ordered is passed as FALSE,
>     then the corresponding call need not be done. It is safe to do it
>     anyway, however omitting it avoids the need to take a global
>     mutex.

Why would this ever be needed ?
(I mean need_prepare_ordered or need_commit_ordered being FALSE)
 
... 
> A TC based on this interface overrides group_log_xid() and
> xid_log_after() instead of log_and_order(), and again does not need to
> deal with any {prepare,commit}_ordered().

Why do you need xid_log_after here ?

General comment:

Wouldn't it be simpler to create only group_log_xid() interface, no
log_and_order() or log_xid() ? The tc plugin gets the list in
group_log_xid() - it can reorder the list any way it wants, call
prepare_ordered() and commit_ordered() as needed and so on.
In this interpretation, group_log_xid() can meet all the use cases.
And there's no need to create a multitude of methods that one
needs to get familiar with before implementing a TC plugin.
 
Regards,
Sergei

P.S. Minor detail - there could be helper functions like
iterate_the_list_and_call_prepare_ordered(), that the plugin can use.

Follow ups

Re: Architecture review of MWL#132 Transaction coordinator plugin
From: Kristian Nielsen, 2010-09-07