← Back to team overview

maria-developers team mailing list archive

commit performance when the binlog is enabled


InnoDB fixed group commit in the InnoDB plugin. This performs as
expected when the binlog is disabled. This does not perform as I
expect when the binlog is enabled.

Is this a problem for PBXT?

The problems for InnoDB are:
1) commit is serialized on the binlog write/fsync
2) row locks are not released until the commit step of XA prepare/commit
3) per-table auto inc locks not released until the commit step of XA

I think that 2) and 3) can be fixed without significant changes. They
cause a lot of convoys today for high-throughput OLTP -- too many
connections needlessly wait on row locks and the per-table auto-inc
lock. Doing the binlog fsync one connection at a time also causes a
lot of convoys. This makes MySQL much slower than it should be for
some workloads even with battery backed RAID write caches.

Problem 1) occurs because:
* there is no group commit for the binlog fsync
* InnoDB locks prepare_commit_mutex in the prepare step

Even if there were group commit for the binlog fsync, it would be
useless for InnoDB because prepare_commit_mutex is locked in the
prepare step and not unlocked until the commit step and the binlog
write/fsync is done between these two steps.

There is a MySQL worklog for this (4007) that:
* doesn't intend to add group commit for the binlog fsync
* doesn't mention the problem of prepare_commit_mutex

I have started to work on this, but don't have any code to share yet.

Pseudo-code for commit with the InnoDB plugin when the binlog is enabled:

    * ht->prepare() == innobase_xa_prepare()
          o trx_prepare_for_mysql(trx)
                + force to disk the trx log buffer for all changes from this trx
                + fsync done here, group prepare may amortize that
          o lock prepare_commit_mutex
    * tc_log->log_xid(thd, xid)
          o writes SQL to binlog, XID to binlog, optionally fsync binlog
    * ha_commit_one_phase()
          o ht->commit() == innobase_commit()
                + innobase_commit_low(()
                      # write commit record to trx log buffer, release
locks from this trx
                      # for auto-commit statements, the per-table
auto-inc lock is released here
                + unlock prepare_commit_mutex
                + trx_commit_complete_for_mysql()
                      # force to disk the trx log buffer including
commit record for this trx
                      # fsync done here, group commit may amortize that

Mark Callaghan

Follow ups