← Back to team overview

maria-developers team mailing list archive

Re: Memory barrier problem in InnoDB/XtraDB mutex implementation causing hangs

 

Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

> Hm, I don't really have a reference to point to. Well, I believe the posix
> thread definition says that a number of operations ensure full memory ordering
> across them.

I guess I was thinking of this:

  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

It is somewhat wague: "The following functions synchronize memory with respect
to other threads: ...".

I thought some more about this. Consider a simple example:

    THREAD A          THREAD B
    lock(M)
    foo()
    unlock(M)
                      lock(M)
                      bar()
                      unlock(M)

Here, I would expect that bar() in B can see all changes made by foo() in A,
because B locked the mutex after A released it. This requires that stores in
foo() are completed before loads in bar() begin, effectively a full barrier.

However, this can be implemented in different ways. It doesn't have to be done
with an explicit full barrier CPU instruction, as long as the correct
semantics is preserved.

So let's consider unlock() with release semantics, and lock() with aquire
semantics:

  unlock(M):
    __sync_lock_release(M->lock_word)

  lock(M):
    while (__sync_lock_test_and_set(M->lock_word, 1))
      /* wait for lock */ ;

The release barrier ensures that stores in foo() will complete before the
store of lock_word. The acquire barrier in the wait loop ensures that loads in
bar() cannot start until a load in the while loop has seen the store to
lock_word.

Thus, loads in bar() become ordered after stores in foo(), effectively a full
memory barrier even though no explicit full barrier instruction was needed.

So you may be right that the implementation needs no explicit full memory
barrier in theory...

On the other hand, I have implemented code that _does_ assume a full barrier
for each operation individually (see
wait_for_commit::wakeup_subsequent_commits2() in 10.0 for an example). Others
could have done the same. Reviewing all InnoDB code to make sure it is not the
case, it does not really seem feasible...

The burden will be on the one who changes InnoDB locking primitives to prove
that such change is safe with respect to the expectations of the code that
uses those primitives. This is made all the more challenging by the semantics
being somewhat unclear, and the main platform x86 mostly doing "full barrier"
by default on atomic operations like test-and-set.

 - Kristian.


Follow ups

References