← Back to team overview

maria-developers team mailing list archive

Re: commit performance when the binlog is enabled

 

Hi Mark,

On Dec 26, 2009, at 3:40 AM, MARK CALLAGHAN wrote:

InnoDB fixed group commit in the InnoDB plugin. This performs as
expected when the binlog is disabled. This does not perform as I
expect when the binlog is enabled.

Is this a problem for PBXT?

PBXT is also affected by the lack of group commit on the binlog.

As Sergei mentioned, most other problems comes from the need to support statement based replication, which is not supported by PBXT.

The problems for InnoDB are:
1) commit is serialized on the binlog write/fsync
2) row locks are not released until the commit step of XA prepare/ commit
3) per-table auto inc locks not released until the commit step of XA

I think that 2) and 3) can be fixed without significant changes. They
cause a lot of convoys today for high-throughput OLTP -- too many
connections needlessly wait on row locks and the per-table auto-inc
lock. Doing the binlog fsync one connection at a time also causes a
lot of convoys. This makes MySQL much slower than it should be for
some workloads even with battery backed RAID write caches.

Problem 1) occurs because:
* there is no group commit for the binlog fsync

Yes, and this will remain so, as long as the transactions are not interleaved in the binlog. With RBR this should be possible.

* InnoDB locks prepare_commit_mutex in the prepare step

What is the purpose of this lock?

Even if there were group commit for the binlog fsync, it would be
useless for InnoDB because prepare_commit_mutex is locked in the
prepare step and not unlocked until the commit step and the binlog
write/fsync is done between these two steps.

There is a MySQL worklog for this (4007) that:
* doesn't intend to add group commit for the binlog fsync
* doesn't mention the problem of prepare_commit_mutex

I have started to work on this, but don't have any code to share yet.

Pseudo-code for commit with the InnoDB plugin when the binlog is enabled:

ha_commit_trans()
   * ht->prepare() == innobase_xa_prepare()
         o trx_prepare_for_mysql(trx)
+ force to disk the trx log buffer for all changes from this trx
               + fsync done here, group prepare may amortize that
         o lock prepare_commit_mutex
   * tc_log->log_xid(thd, xid)
o writes SQL to binlog, XID to binlog, optionally fsync binlog
   * ha_commit_one_phase()
         o ht->commit() == innobase_commit()
               + innobase_commit_low(()
                     # write commit record to trx log buffer, release
locks from this trx
                     # for auto-commit statements, the per-table
auto-inc lock is released here
               + unlock prepare_commit_mutex
               + trx_commit_complete_for_mysql()
                     # force to disk the trx log buffer including
commit record for this trx
                     # fsync done here, group commit may amortize that

--
Mark Callaghan
mdcallag@xxxxxxxxx



--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com






Follow ups

References