maria-developers team mailing list archive

Thread
Date
Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Sergey Vojtovich <svoj@xxxxxxxxxxx>
Date: Mon, 17 Nov 2014 13:07:47 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87h9y2f5da.fsf@frigg.knielsen-hq.org>
User-agent: Mutt/1.5.21 (2010-09-15)
Hi Kristian,

thanks for you feedback. Answers inline.

On Fri, Nov 14, 2014 at 11:23:13AM +0100, Kristian Nielsen wrote:
> Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:
> 
> > revno: 4346
> > revision-id: svoj@xxxxxxxxxxx-20141112114907-thnro6e3kg2ofdsw
> > committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > timestamp: Wed 2014-11-12 15:49:07 +0400
> > message:
> >   MDEV-7026 - Occasional hang during startup on Power8
> 
> 
> > === modified file 'storage/xtradb/sync/sync0sync.c'
> > --- a/storage/xtradb/sync/sync0sync.c	2014-11-03 13:43:44 +0000
> > +++ b/storage/xtradb/sync/sync0sync.c	2014-11-12 11:49:07 +0000
> > @@ -467,26 +467,15 @@ mutex_set_waiters(
> >  	mutex_t*	mutex,	/*!< in: mutex */
> >  	ulint		n)	/*!< in: value to set */
> >  {
> > -#ifdef INNODB_RW_LOCKS_USE_ATOMICS
> > -	ut_ad(mutex);
> > -
> > -	if (n) {
> > -		os_compare_and_swap_ulint(&mutex->waiters, 0, 1);
> > -	} else {
> > -		os_compare_and_swap_ulint(&mutex->waiters, 1, 0);
> > -	}
> > -#else
> >  	volatile ulint*	ptr;		/* declared volatile to ensure that
> >  					the value is stored to memory */
> >  	ut_ad(mutex);
> >  
> >  	ptr = &(mutex->waiters);
> >  
> > -        os_wmb;
> > -
> >  	*ptr = n;		/* Here we assume that the write of a single
> >  				word in memory is atomic */
> > -#endif
> > +        os_sync;
> >  }
> 
> This code is still substantially different from what it was in 5.5.39 and
> 10.0.12, right? I think we need to instead revert the code to what it was
> before introducing the problem.
> 
> The InnoDB mutex code clearly does not work by careful design -
> mutex_reset_lock_word() needs a full barrier but uses what is documented by
> GCC as only an acquire barrier. Rather, it works by having been tweaked over
> the years to just happen to work (eg. GCC in fact uses a full barrier on x86).
> 
> So changing this to be "correct" is very dangerous in a stable release. It is
> not possible to ensure correctness by testing, either.
> 
> But it is not necessary to change it. Instead, leave the code as it was
> before, and use #ifdef __powerpc__ to ensure that it will work also on
> PowerPC.
> 
> For example, something like this (just a sketch, I didn't check exactly how
> the code should look):
> 
> #ifdef __powerpc__
> /* InnoDB mutex implementation needs full memory barrier for this operation */
> # define os_atomic_test_and_set_byte(ptr, new_val) \
> 	do { sync(); __sync_lock_test_and_set(ptr, (byte) new_val); } while (0)
> #else
> # define os_atomic_test_and_set_byte(ptr, new_val) \
> 	__sync_lock_test_and_set(ptr, (byte) new_val)
> #endif
> 
> Then we leave stable releases 5.5 and 10.0 unchanged, reducing the risk of
> introducing another serious breakage. And we can still fix PowerPC to work
> correctly.
> 
> And then perhaps in 10.1 we can do refactoring of the use of memory barriers
> in InnoDB, if we really want. (Just in case it's not obvious, yes, I _really_
> have the InnoDB mutex code).
> 
> Remember, the 5.5 and 10.0 minor releases appear as security updates in
> distros, so it is very important to minimise the risk of introducing serious
> regressions.
I think your idea of fixing this is acceptable. I'll send another patch soon.

> 
> 
> >   Since isync is not a memory barrier and lwsync doesn't guarantee StoreLoad
> >   order, the only option we have is full memory barrier.
> 
> It's not a problem to have a full barrier here, I think. Because at this
> point, the waiter has already gone through a lot of spin loops, and now it is
> going to take a global mutex on the sync wait array, scan the array, and go to
> sleep with a context switch. One less full barrier is not going to help a lot.
> To improve this contended case of the InnoDB mutexes, more radical changes are
> needed.
Look at the cset comment: every mutex_exit() has to issue full memory barrier
unconditionally!

But I benchmarked this patch and it worked slightly faster on Power8. I suppose
because stale threads don't have to wait a second till wakeup from error monitor
thread.

> 
> 
> >   InnoDB/XtraDB was getting stuck during server startup. In particular (but not
> >   limited to) while creating doublewrite buffer.
> 
> The stuck during startup is kind of the minor part of the bug. The major part
> is stalling the server during operation. (The hang during startup was just how
> the problem was detected in Buildbot).
> 
> Better clarify in the commit message that this fixes a bug that could cause
> the entire server to stall (or even hang completely if error monitor thread
> itself got stuck). This will help people who experienced the problem and look
> at the changelog.
A few quotes from a different thread:
...
Kristian> 2. In a user hang I investigated in detail, a thread was waiting on
Kristian> log_sys->mutex, but no other thread was holding that mutex (consistent with
Kristian> lost wakeup). The server error monitor thread was itself stuck on a mutex,
Kristian> which would prevent sync_arr_wake_threads_if_sema_free() from hiding the
Kristian> problem.
me> Strange... Monty should have fixed this. Error monitor thread should call
me> log_get_lsn_nowait(), which basically does trylock. Do you happen to have call
me> trace?
Kristian> This investigation was some time ago (maybe 1-2 months). It seems likely that
Kristian> this was a version before Monty's log_get_lsn_nowait() (eg. 10.0.14 has it,
Kristian> but not 10.0.13).
...
me> According to history ACQUIRE -> RELEASE fix appeared in 10.0.13 and fix for
me> log_get_lsn() appeared in 10.0.14. Both fixes appeared similtaneously in 5.5.40.
me>
me> So runtime hangs should be solved both in 5.5 and 10.0. This leaves hangs during
me> startup, which are unfortunate but not as critical as runtime hangs.
me>
me> Are there any other known hangs?
Kristian> If you are going to argue for deliberately keeping broken mutex_enter() /
Kristian> mutex_exit() in MariaDB, I will stay out of the discussion. I was only trying
Kristian> to get Buildbot green so that I can push my completely unrelated fixes for
Kristian> 10.0.
...

Stating that this patch fixes run-time hangs that I'm not aware of is kind of
strange.

So I repeat my question: Are there any other known hangs?

> 
> 
> >   It happens because mutex_exit() didn't wake up waiters of affected mutex. This
> >   is kind of acceptable because error monitor thread is supposed to wake up such
> >   waiters. But since error monitor thread hasn't been started that early, nobody
> >   did inform waiting threads that mutex was released.
> 
> I do not think the error monitor is "supposed to wake up such waiters". In
> fact, it seems clear that the error monitor wakeup is _not_ used, at least not
> in commonly used binaries. Because it just never worked - it itself needs to
> take mutexes, which will prevent it from doing the wakeup. So if it was
> needed, someone would have noticed this problem. The symptom is _very_ hard to
> miss, as the whole server gets stuck hard, unable to do anything.
> 
> It really is not acceptable to have a mutex implementation that can miss
> wakeups. This can stall the entire server. Even if the error monitor resolves
> the stall after 1 second, that is still unacceptable for many applications.
Could you suggest better wording for cset comment?

Thanks,
Sergey
Follow ups

Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5
From: Kristian Nielsen, 2014-11-17
References

Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5
From: Kristian Nielsen, 2014-11-14