maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #01803
commit performance when the binlog is enabled
InnoDB fixed group commit in the InnoDB plugin. This performs as
expected when the binlog is disabled. This does not perform as I
expect when the binlog is enabled.
Is this a problem for PBXT?
The problems for InnoDB are:
1) commit is serialized on the binlog write/fsync
2) row locks are not released until the commit step of XA prepare/commit
3) per-table auto inc locks not released until the commit step of XA
I think that 2) and 3) can be fixed without significant changes. They
cause a lot of convoys today for high-throughput OLTP -- too many
connections needlessly wait on row locks and the per-table auto-inc
lock. Doing the binlog fsync one connection at a time also causes a
lot of convoys. This makes MySQL much slower than it should be for
some workloads even with battery backed RAID write caches.
Problem 1) occurs because:
* there is no group commit for the binlog fsync
* InnoDB locks prepare_commit_mutex in the prepare step
Even if there were group commit for the binlog fsync, it would be
useless for InnoDB because prepare_commit_mutex is locked in the
prepare step and not unlocked until the commit step and the binlog
write/fsync is done between these two steps.
There is a MySQL worklog for this (4007) that:
* doesn't intend to add group commit for the binlog fsync
* doesn't mention the problem of prepare_commit_mutex
I have started to work on this, but don't have any code to share yet.
Pseudo-code for commit with the InnoDB plugin when the binlog is enabled:
ha_commit_trans()
* ht->prepare() == innobase_xa_prepare()
o trx_prepare_for_mysql(trx)
+ force to disk the trx log buffer for all changes from this trx
+ fsync done here, group prepare may amortize that
o lock prepare_commit_mutex
* tc_log->log_xid(thd, xid)
o writes SQL to binlog, XID to binlog, optionally fsync binlog
* ha_commit_one_phase()
o ht->commit() == innobase_commit()
+ innobase_commit_low(()
# write commit record to trx log buffer, release
locks from this trx
# for auto-commit statements, the per-table
auto-inc lock is released here
+ unlock prepare_commit_mutex
+ trx_commit_complete_for_mysql()
# force to disk the trx log buffer including
commit record for this trx
# fsync done here, group commit may amortize that
--
Mark Callaghan
mdcallag@xxxxxxxxx
Follow ups