← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 4367: MDEV-7026: Race in InnoDB/XtraDB mutex implementation can stall or hang the server. in http://bazaar.launchpad.net/~maria-captains/maria/5.5


Hi Kristian,

thanks for taking over this bug! I believe this is the patch that was pushed
into 5.5. Unfortunately I think there is still one potential for a hang left
(details below). :(

Since InnoDB and XtraDB changes are identical I'll comment only XtraDB changes.
So you removed os_isync and implemented os_mb instead. That's good!
You also removed a bunch of os_wmb/os_rmb calls. That's good too.
And you kept RELEASE barrier for mutex_exit() on PPC64. Very good!

I will also comment things that I don't really like, but they're just for your
information (no action required).

On Wed, Nov 19, 2014 at 01:56:46PM +0100, knielsen@xxxxxxxxxxxxxxx wrote:
> At http://bazaar.launchpad.net/~maria-captains/maria/5.5
> ------------------------------------------------------------
> revno: 4367
> revision-id: knielsen@xxxxxxxxxxxxxxx-20141119125646-njq02h7irmu9s71b
> parent: sergii@xxxxxxxxx-20141118231952-fk3o69075dn3ravv
> committer: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
> branch nick: mariadb-5.5
> timestamp: Wed 2014-11-19 13:56:46 +0100
> message:
>   MDEV-7026: Race in InnoDB/XtraDB mutex implementation can stall or hang the server.
>   The bug was that full memory barrier was missing in the code that ensures that
>   a waiter on an InnoDB mutex will not go to sleep unless it is guaranteed to be
>   woken up again by another thread currently holding the mutex. This made
>   possible a race where a thread could get stuck waiting for a mutex that is in
>   fact no longer locked. If that thread was also holding other critical locks,
>   this could stall the entire server. There is an error monitor thread than can
>   break the stall, it runs about once per second. But if the error monitor
>   thread itself got stuck or was not running, then the entire server could hang
>   infinitely.
>   This was introduced on i386/amd64 platforms in 5.5.40 and 10.0.13 by an
>   incorrect patch that tried to fix the similar problem for PowerPC.
JFYI: Strictly speaking that "incorrect patch" attempted to fix different
problem (read carefully):
it added RELEASE barrier between "stores inside mutex" and "lock_word= 0"
while this patch adds full barrier between "lock_word= 0" and "if (waiters)".

>   This commit reverts the incorrect PowerPC patch, and instead implements a fix
>   for PowerPC that does not change i386/amd64 behaviour, making PowerPC work
>   similarly to i386/amd64.

> === modified file 'storage/xtradb/include/os0sync.h'
> --- a/storage/xtradb/include/os0sync.h	2014-09-08 15:10:48 +0000
> +++ b/storage/xtradb/include/os0sync.h	2014-11-19 12:56:46 +0000
> @@ -314,11 +331,28 @@ amount of increment. */
>  /**********************************************************//**
>  Returns the old value of *ptr, atomically sets *ptr to new_val */
> -# define os_atomic_test_and_set_byte(ptr, new_val) \
> +#ifdef __powerpc__
> +/*
> +  os_atomic_test_and_set_byte_release() should imply a release barrier before
> +  setting, and a full barrier after. But __sync_lock_test_and_set() is only
> +  documented as an aquire barrier. So on PowerPC we need to add the full
> +  barrier explicitly.  */
> +# define os_atomic_test_and_set_byte_release(ptr, new_val) \
> +        do { __sync_lock_release(ptr); \
> +                __sync_synchronize(); } while (0)
> +#else
> +/*
> +  On x86, __sync_lock_test_and_set() happens to be full barrier, due to
> +  LOCK prefix.
> +*/
> +# define os_atomic_test_and_set_byte_release(ptr, new_val) \
> +        __sync_lock_test_and_set(ptr, (byte) new_val)
> +#endif
JFYI: "os_atomic_test_and_set_byte_release" sounds odd to me. Look at gcc
manual: https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
  This builtin is not a full barrier, but rather an acquire barrier. This means
  that references after the builtin cannot move to (or be speculated to) before
  the builtin, but previous memory stores may not be globally visible yet, and
  previous memory loads may not yet be satisfied.

You name RELEASE something that is acqually ACQUIRE. So why didn't you like
"os_atomic_lock_release_byte" which is RELASE btw:
  This builtin is not a full barrier, but rather a release barrier. This means
  that all previous memory stores are globally visible, and all previous memory
  loads have been satisfied, but following memory reads are not prevented from
  being speculated to before the barrier.

> +/*
> +  os_atomic_test_and_set_byte_acquire() is a full memory barrier on x86. But
> +  in general, just an aquire barrier should be sufficient. */
> +# define os_atomic_test_and_set_byte_acquire(ptr, new_val) \
>          __sync_lock_test_and_set(ptr, (byte) new_val)
!!! The problem is here!!! Strictly speaking not here, but in mutex_spin_wait():
        mutex_set_waiters(mutex, 1);

        /* Try to reserve still a few times */
        for (i = 0; i < 4; i++) {
                if (mutex_test_and_set(mutex) == 0) {
We'll get stuck if "waiters= 1" will become visible after
"if (mutex_test_and_set(mutex))". I don't see anything preventing this reorder.


> === modified file 'storage/xtradb/include/sync0sync.ic'
> --- a/storage/xtradb/include/sync0sync.ic	2014-08-29 12:02:46 +0000
> +++ b/storage/xtradb/include/sync0sync.ic	2014-11-19 12:56:46 +0000
> @@ -80,11 +80,11 @@ mutex_test_and_set(
>          mutex_t*        mutex)  /*!< in: mutex */
>  {
>  #if defined(HAVE_ATOMIC_BUILTINS)
> -        return(os_atomic_test_and_set_byte(&mutex->lock_word, 1));
> +        return(os_atomic_test_and_set_byte_acquire(&mutex->lock_word, 1));
>  #else
>          ibool   ret;
> -        ret = os_fast_mutex_trylock(&(mutex->os_fast_mutex));
> +        ret = os_fast_mutex_trylock_full_barrier(&(mutex->os_fast_mutex));
>          if (ret == 0) {
>                  /* We check that os_fast_mutex_trylock does not leak
JFYI: This is probably the most funny thing. Here we're in the "#else" branch of
"#if defined(HAVE_ATOMIC_BUILTINS)". E.g. if we compile with gcc and enter this
branch it means: sync builtins are not avialble (gcc atomic builtins are very
likely not available too).

Now you say "os_fast_mutex_trylock_full_barrier" which does os_mb. But since
neither sync bultins nor atomic builtins are available, os_mb is basically

This may be different on other platforms, but to my knowledge memory barriers
and atomic operations normally come together.

Another thing: did you try compiling with this branch enabled? IIRC it didn't
compile even berfore we started fixing it.



Follow ups