← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5


Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:

> revno: 4346
> revision-id: svoj@xxxxxxxxxxx-20141112114907-thnro6e3kg2ofdsw
> committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> timestamp: Wed 2014-11-12 15:49:07 +0400
> message:
>   MDEV-7026 - Occasional hang during startup on Power8

> === modified file 'storage/xtradb/sync/sync0sync.c'
> --- a/storage/xtradb/sync/sync0sync.c	2014-11-03 13:43:44 +0000
> +++ b/storage/xtradb/sync/sync0sync.c	2014-11-12 11:49:07 +0000
> @@ -467,26 +467,15 @@ mutex_set_waiters(
>  	mutex_t*	mutex,	/*!< in: mutex */
>  	ulint		n)	/*!< in: value to set */
>  {
> -	ut_ad(mutex);
> -
> -	if (n) {
> -		os_compare_and_swap_ulint(&mutex->waiters, 0, 1);
> -	} else {
> -		os_compare_and_swap_ulint(&mutex->waiters, 1, 0);
> -	}
> -#else
>  	volatile ulint*	ptr;		/* declared volatile to ensure that
>  					the value is stored to memory */
>  	ut_ad(mutex);
>  	ptr = &(mutex->waiters);
> -        os_wmb;
> -
>  	*ptr = n;		/* Here we assume that the write of a single
>  				word in memory is atomic */
> -#endif
> +        os_sync;
>  }

This code is still substantially different from what it was in 5.5.39 and
10.0.12, right? I think we need to instead revert the code to what it was
before introducing the problem.

The InnoDB mutex code clearly does not work by careful design -
mutex_reset_lock_word() needs a full barrier but uses what is documented by
GCC as only an acquire barrier. Rather, it works by having been tweaked over
the years to just happen to work (eg. GCC in fact uses a full barrier on x86).

So changing this to be "correct" is very dangerous in a stable release. It is
not possible to ensure correctness by testing, either.

But it is not necessary to change it. Instead, leave the code as it was
before, and use #ifdef __powerpc__ to ensure that it will work also on

For example, something like this (just a sketch, I didn't check exactly how
the code should look):

#ifdef __powerpc__
/* InnoDB mutex implementation needs full memory barrier for this operation */
# define os_atomic_test_and_set_byte(ptr, new_val) \
	do { sync(); __sync_lock_test_and_set(ptr, (byte) new_val); } while (0)
# define os_atomic_test_and_set_byte(ptr, new_val) \
	__sync_lock_test_and_set(ptr, (byte) new_val)

Then we leave stable releases 5.5 and 10.0 unchanged, reducing the risk of
introducing another serious breakage. And we can still fix PowerPC to work

And then perhaps in 10.1 we can do refactoring of the use of memory barriers
in InnoDB, if we really want. (Just in case it's not obvious, yes, I _really_
have the InnoDB mutex code).

Remember, the 5.5 and 10.0 minor releases appear as security updates in
distros, so it is very important to minimise the risk of introducing serious

>   Since isync is not a memory barrier and lwsync doesn't guarantee StoreLoad
>   order, the only option we have is full memory barrier.

It's not a problem to have a full barrier here, I think. Because at this
point, the waiter has already gone through a lot of spin loops, and now it is
going to take a global mutex on the sync wait array, scan the array, and go to
sleep with a context switch. One less full barrier is not going to help a lot.
To improve this contended case of the InnoDB mutexes, more radical changes are

>   InnoDB/XtraDB was getting stuck during server startup. In particular (but not
>   limited to) while creating doublewrite buffer.

The stuck during startup is kind of the minor part of the bug. The major part
is stalling the server during operation. (The hang during startup was just how
the problem was detected in Buildbot).

Better clarify in the commit message that this fixes a bug that could cause
the entire server to stall (or even hang completely if error monitor thread
itself got stuck). This will help people who experienced the problem and look
at the changelog.

>   It happens because mutex_exit() didn't wake up waiters of affected mutex. This
>   is kind of acceptable because error monitor thread is supposed to wake up such
>   waiters. But since error monitor thread hasn't been started that early, nobody
>   did inform waiting threads that mutex was released.

I do not think the error monitor is "supposed to wake up such waiters". In
fact, it seems clear that the error monitor wakeup is _not_ used, at least not
in commonly used binaries. Because it just never worked - it itself needs to
take mutexes, which will prevent it from doing the wakeup. So if it was
needed, someone would have noticed this problem. The symptom is _very_ hard to
miss, as the whole server gets stuck hard, unable to do anything.

It really is not acceptable to have a mutex implementation that can miss
wakeups. This can stall the entire server. Even if the error monitor resolves
the stall after 1 second, that is still unacceptable for many applications.

Hope this helps,

 - Kristian.

Follow ups