maria-developers team mailing list archive

Thread
Date
WL#163 New (by Knielsen): Implement release of row locks in InnoDB during prepare() phase

To: maria-developers@xxxxxxxxxxxxxxxxxxx
From: worklog-noreply@xxxxxxxxxxxx
Date: Wed, 27 Oct 2010 12:38:17 +0000 (UTC)
-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Implement release of row locks in InnoDB during prepare() phase
CREATION DATE..: Wed, 27 Oct 2010, 12:38
SUPERVISOR.....: Sergei
IMPLEMENTOR....: 
COPIES TO......: 
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 163 (http://askmonty.org/worklog/?tid=163)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:



DESCRIPTION:

This is a feature implemented in the Facebook patch[1]. It is intended to
improve performance in the presence of hotspot rows in high-concurrency
applications.

When binlog is enabled and a transaction is committed in InnoDB, there are
three fsync()s done (in prepare(), in binlog write, and in commit()), and
InnoDB row locks are only released after the second of them (or third, not
100% sure).

If many transactions are updating the same rows, this becomes a point of
contention, and may also limit the opportunity for the group commit
optimisation. The idea of the Facebook patch is to release the locks earlier,
in the prepare phase, to reduce the impact of this problem.

In order to preserve correctness (eg. of statement-based replication), it is
necessary that locks are released for transactions in the order that these
transactions commit. So when we release locks during transaction A's prepare
phase, we must ensure that A will commit earlier than any other transaction
that might be waiting for those locks.

There are two kinds of row locks in InnoDB. There are explicit locks, like the
gap locks in higher isolation levels and SELECT FOR UPDATE locks. And implicit
locks, which is when one transaction modifies a row; such modification will
update the row with the id of the modifying transaction, and another
transaction will treat such row as effectively locked if the modifying
transaction has not yet committed (this is how I understand it, though I'm not
100% sure of the details).

The question is to what extend it is safe/correct to release the locks early.

I believe the release of implicit locks is safe during prepare(). Such locks
serve only the purpose of making sure that other transactions will not commit
before us, eg. because they will touch rows that were invisible to our
transaction, and thus must also be invisible on a replication slave (gap
locks). If we already decided to commit ourself before such other transaction,
then that purpose is already fulfilled.

I talked to Mark Callaghan about the Facebook patch for this problem. He told
me that they now also release explicit locks early, and furthermore also make
changes done by the releasing transaction visible to other transactions. In
effect, the transaction is committed (to memory) during prepare. This has some
more severe implications:

 - Since the changes done by transaction A are visible to transaction B, A can
   no longer safely rollback. In the facebook patch, they kill the server if A
   needs to roll back after prepare().

 - In the case of a server crash, the transaction A may not be recovered (if
   it was never written to the binary log), yet may have been seen by the
   application. This means that a change was observed in the database that in
   effect never took place.

Still, such even more aggressive early release might be useful to enable with
an option that is off by default.

I think some more work is needed to fully access the implications and safety
of such early lock release.

References:

[1] https://launchpad.net/mysqlatfacebook


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v4.0.0)