← Back to team overview

maria-developers team mailing list archive

Re: commit performance when the binlog is enabled


On Mon, Dec 28, 2009 at 9:20 AM, Sergei Golubchik <sergii@xxxxxxxxx> wrote:
> Hi, MARK!
> On Dec 25, MARK CALLAGHAN wrote:
>> InnoDB fixed group commit in the InnoDB plugin. This performs as
>> expected when the binlog is disabled. This does not perform as I
>> expect when the binlog is enabled.
>> The problems for InnoDB are:
>> 1) commit is serialized on the binlog write/fsync
>> 2) row locks are not released until the commit step of XA prepare/commit
>> 3) per-table auto inc locks not released until the commit step of XA
>> I think that 2) and 3) can be fixed without significant changes.
> It's not that easy, I think.
> What InnoDB needs locks for ?
> Not for protecting uncommitted changes - it uses versioning for it.
> For serializability (when innodb_locks_unsafe_for_binlog=true or on
> SERIALIZABLE level) and for explicit SELECT ... IN SHARE MORE or FOR
> UPDATE. Explicit locks are typically used when one reads the data and
> later modifies them in the same transaction based on the read values,
> right ?
> After xa_prepare no data can be modified anymore, it's safe to release
> these explicit locks.
> If InnoDB locks would be protecting uncommitted data from beeing seen by
> another transaction, they would have to stay until commit - but InnoDB
> doesn't use locks for this. Safe too.
> But locks that help to maintain serializability still have to be
> released on commit, I'm afraid. Otherwise you'll have
>   trn1> start transaction; insert t1 select * from t2;
>   trn1> commit;
>   trn1>> ... xa_prepare() ...
>   trn2> start transaction; insert t2 values (1); commit;
>   trn2>> xa_prepare(); binlog.write(); xa_commit();
>   trn1> ... binlog.write(); xa_commit();
> and you have incorrect transaction order in binlog.

There are several issues here:
* for SBR, tm1 cannot release row locks until it is guaranteed that it
writes the binlog ahead of any dependent transactions. This is
guaranteed by locking prepare_commit_mutex at the end of
innobase_xa_prepare and not unlocking until row locks are released
during the call to innobase_commit.

* at least for the plugin the order in which InnoDB prepare is done
might not match the order in which transactions are written to the
binlog. InnoDB locks prepare_commit_mutex in innobase_xa_prepare after
doing a prepare (the call to trx_prepare_for_mysql). It is unlocked
after the commit record is written to the InnoDB transaction buffer
and before that buffer is flushed to disk. What does match today is
the order of transactions in the binlog and the commit records in the
InnoDB transaction log.

* Traditional implementations of group commit require releasing locks
earlier in the commit cycle. Group commit works by pausing commit
processing in the hope that other commits will be done so they can
share 1 fsync. It is a bad idea to hold locks during this pause.

I don't know whether InnoDB requires:
1) that transactions in the binlog and commit records in the innodb
transaction log record things in the same order
2) all of 1) above and the binlog is at most one trx ahead of the
innodb transaction log

prepare_commit_mutex provides 2) today and that makes group commit for
the binlog unlikely or impossible. I am trying to determine myself
whether 2) is required and get an answer from the InnoDB team.

If 1) is required instead of 2) then group commit on the binlog is
possible for InnoDB. Group commit with SBR is possible as long as the
per-transaction lock release order determines the order in which the
binlog is written.

Mark Callaghan

Follow ups