← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5

 

Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:

> revno: 4346
> revision-id: svoj@xxxxxxxxxxx-20141118080742-sdf93kgsgxulenvb
> parent: knielsen@xxxxxxxxxxxxxxx-20141112101013-t63ayylsk08ncgs3
> committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> branch nick: 5.5
> timestamp: Tue 2014-11-18 12:07:42 +0400
> message:
>   MDEV-7026 - Occasional hang during startup on Power8

Right, so you requested second review ASAP, I'm trying to understand what's
going on here.

If my understanding is correct, this patch goes on top of current 5.5.40, not
on top of the patch I reviewed previously.

So there are two cases, depending on whether INNODB_RW_LOCKS_USE_ATOMICS is
set or not. It is my understanding that this is "usually" set, eg. on our
common x86 and powerpc linux binaries.

The critical thing for the patch at hand is the memory barrier between
mutex_set_waiters() and mutex_test_and_set() in the waiter, and
mutex_reset_lock_word() and mutex_get_waiters() in the thread releasing the
lock.

A full memory barrier is needed in the waiter, to be sure that
mutex_set_waiters() becomes visible to the releasing thread before
mutex_test_and_set() runs. And a full barrier is needed in the releasing
thread, to be sure that mutex_reset_lock_word() becomes visible before
mutex_get_waiters() runs.

So let's see how this is ensured. First the case of x86 with
INNODB_RW_LOCKS_USE_ATOMICS set, HAVE_ATOMIC_BUILTINS, pre-5.5.40:

 - mutex_set_waiters() uses os_compare_and_swap_ulint() which is
   __sync_bool_compare_and_swap() which is full barrier.

 - mutex_reset_lock_word() uses os_atomic_test_and_set_byte() which is
   __sync_lock_test_and_set(). This is documented only as a release barrier,
   but is in fact a full barrier on x86 with current GCC.

So in this case full barrier is there as needed.

If HAVE_ATOMIC_BUILTINS is not set, then mutex_reset_lock_word() uses
os_fast_mutex_unlock() which is pthread_mutex_unlock(), which is also full
barrier on x86.

If INNODB_RW_LOCKS_USE_ATOMICS is not set, then mutex_set_waiters() has no
barriers. But then mutex_test_and_set() instead has a full barrier on
x86. Either in os_atomic_test_and_set_byte() which is
__sync_lock_test_and_set(), if HAVE_ATOMIC_BUILTINS is set, or in
os_fast_mutex_trylock() otherwise. Both of these are full barriers on x86.

So this is why things work ok on x86, apparently. Very bad code that seriously
needs cleanup and proper documentation of memory barrier semantics. But it is
what we have to work with in 5.5.

So to make things work on PowerPC without risk of breaking x86, we could do
this:

 - Make sure os_atomic_test_and_set_byte() is full barrier on PowerPC:

#ifdef __powerpc__
/*
  os_atomic_test_and_set_byte() should imply a full barrier. But
  __sync_lock_test_and_set() is only documented as an aquire barrier. So on
  PowerPC we need to add the full barrier explicitly.
*/
# define os_atomic_test_and_set_byte(ptr, new_val) \
        do { __sync_lock_test_and_set(ptr, (byte) new_val); \
             __atomic_thread_fence(__ATOMIC_SEQ_CST); } while(0)
#else
/*
  On x86, __sync_lock_test_and_set() happens to be full barrier, due to
  LOCK prefix.
*/
# define os_atomic_test_and_set_byte(ptr, new_val) \
        __sync_lock_test_and_set(ptr, (byte) new_val)
#endif


 - Make sure that mutex_reset_lock_word() is a full barrier when
   HAVE_ATOMIC_BUILTINS is not set, by adding a memory barrier:

        os_fast_mutex_unlock(&(mutex->os_fast_mutex));
#ifdef __powerpc__
        /*
          On PowerPC, pthread_mutex does not imply full barrier, so we need to
          add it explicitly.
        */
        __atomic_thread_fence(__ATOMIC_SEQ_CST);
#endif


 - For the case of neither HAVE_ATOMIC_BUILTINS nor
   INNODB_RW_LOCKS_USE_ATOMICS set, similarly add a full barrier in
   mutex_test_and_set() for PowerPC:

#ifdef __powerpc__
        /*
          On PowerPC, pthread_mutex does not imply full barrier, so we need to
          add it explicitly.
        */
        __atomic_thread_fence(__ATOMIC_SEQ_CST);
#endif
	ret = os_fast_mutex_trylock(&(mutex->os_fast_mutex));


Then in 10.1. if someone is brave enough, we can clean this up properly.
Clearly define what the memory barrier semantics is, and implement it using
the correct primitives that work as needed on all platforms.

So now that I understand (hopefully...) how the code used to work, and how it
can be fixed for PowerPC without changing existing code, let me look at the
new patch ...


> === modified file 'storage/innobase/include/os0sync.h'
> --- a/storage/innobase/include/os0sync.h      2014-09-08 15:10:48 +0000
> +++ b/storage/innobase/include/os0sync.h      2014-11-18 08:07:42 +0000
> @@ -310,8 +310,16 @@ Returns the old value of *ptr, atomicall
>  # define os_atomic_test_and_set_byte(ptr, new_val) \
>       __sync_lock_test_and_set(ptr, (byte) new_val)
>  
> +#ifdef __powerpc__
>  # define os_atomic_lock_release_byte(ptr) \
>       __sync_lock_release(ptr)
> +#else
> +/* In theory __sync_lock_release should be used to release the lock.
> +Unfortunately, it does not work properly alone. The workaround is
> +that more conservative __sync_lock_test_and_set is used instead. */
> +# define os_atomic_lock_release_byte(ptr) \
> +     __sync_lock_test_and_set(ptr, 0)
> +#endif

This is very confusing. You define a method with full barrier semantics, but
call it something suggesting release semantics.

Why not revert the original change so that mutex_reset_lock_word() uses
os_atomic_test_and_set_byte()? And then fix the implementation of
os_atomic_test_and_set_byte() to be full barrier on PowerPC, as I suggested
above? The term "atomic" generally implies full barrier in the MySQL source
code...

>  #elif defined(HAVE_IB_SOLARIS_ATOMICS)
>  
> @@ -428,7 +436,7 @@ clobbered */
>  # define os_rmb      __atomic_thread_fence(__ATOMIC_ACQUIRE)
>  # define os_wmb      __atomic_thread_fence(__ATOMIC_RELEASE)
>  #ifdef __powerpc__
> -# define os_isync  __asm __volatile ("isync":::"memory")
> +# define os_isync __atomic_thread_fence(__ATOMIC_SEQ_CST)
>  #else
>  #define os_isync do { } while(0)
>  #endif

As Jonas suggested, don't call a full memory barrier "isync". And do not
introduce something that is a barrier on one platform but a no-operation on
another platform. But ...

> === modified file 'storage/innobase/sync/sync0sync.c'
> --- a/storage/innobase/sync/sync0sync.c       2014-11-03 13:43:44 +0000
> +++ b/storage/innobase/sync/sync0sync.c       2014-11-18 08:07:42 +0000
> @@ -474,10 +474,9 @@ mutex_set_waiters(
>  
>       ptr = &(mutex->waiters);
>  
> -        os_wmb;
> -
>       *ptr = n;               /* Here we assume that the write of a single
>                               word in memory is atomic */
> +        os_isync;
>  }

With this, the semantics of how things work becomes even more complicated. Now
on PowerPC the barrier is in one function, but on x86 the barrier is in
another function.

Why not instead do as I suggested above? Put in full barriers for PowerPC in
the same places that they are in x86?

----

Maybe I should try to say if I think the patch fixes the actual bug, in case
you don't want to go with my suggestions and instead keep your approach? Since
it goes back to using __sync_lock_test_and_set() on x86, and since it has a
full barrier in mutex_exit_func() on PowerPC, I think perhaps it does. But it
is really hard to understand exactly what is going on, as you can see above I
had to do a lot of reasoning to understand even why the old code worked...

My advice is still to revert the original incorrect PowerPC patch and do
something that is at least consistent with the existing code...

(And then feel free to do a proper cleanup in 10.1).

Hope this helps,

 - Kristian.


Follow ups