maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07902
Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5
Hi Kristian,
thanks for you feedback. Answers inline.
On Fri, Nov 14, 2014 at 11:23:13AM +0100, Kristian Nielsen wrote:
> Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:
>
> > revno: 4346
> > revision-id: svoj@xxxxxxxxxxx-20141112114907-thnro6e3kg2ofdsw
> > committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > timestamp: Wed 2014-11-12 15:49:07 +0400
> > message:
> > MDEV-7026 - Occasional hang during startup on Power8
>
>
> > === modified file 'storage/xtradb/sync/sync0sync.c'
> > --- a/storage/xtradb/sync/sync0sync.c 2014-11-03 13:43:44 +0000
> > +++ b/storage/xtradb/sync/sync0sync.c 2014-11-12 11:49:07 +0000
> > @@ -467,26 +467,15 @@ mutex_set_waiters(
> > mutex_t* mutex, /*!< in: mutex */
> > ulint n) /*!< in: value to set */
> > {
> > -#ifdef INNODB_RW_LOCKS_USE_ATOMICS
> > - ut_ad(mutex);
> > -
> > - if (n) {
> > - os_compare_and_swap_ulint(&mutex->waiters, 0, 1);
> > - } else {
> > - os_compare_and_swap_ulint(&mutex->waiters, 1, 0);
> > - }
> > -#else
> > volatile ulint* ptr; /* declared volatile to ensure that
> > the value is stored to memory */
> > ut_ad(mutex);
> >
> > ptr = &(mutex->waiters);
> >
> > - os_wmb;
> > -
> > *ptr = n; /* Here we assume that the write of a single
> > word in memory is atomic */
> > -#endif
> > + os_sync;
> > }
>
> This code is still substantially different from what it was in 5.5.39 and
> 10.0.12, right? I think we need to instead revert the code to what it was
> before introducing the problem.
>
> The InnoDB mutex code clearly does not work by careful design -
> mutex_reset_lock_word() needs a full barrier but uses what is documented by
> GCC as only an acquire barrier. Rather, it works by having been tweaked over
> the years to just happen to work (eg. GCC in fact uses a full barrier on x86).
>
> So changing this to be "correct" is very dangerous in a stable release. It is
> not possible to ensure correctness by testing, either.
>
> But it is not necessary to change it. Instead, leave the code as it was
> before, and use #ifdef __powerpc__ to ensure that it will work also on
> PowerPC.
>
> For example, something like this (just a sketch, I didn't check exactly how
> the code should look):
>
> #ifdef __powerpc__
> /* InnoDB mutex implementation needs full memory barrier for this operation */
> # define os_atomic_test_and_set_byte(ptr, new_val) \
> do { sync(); __sync_lock_test_and_set(ptr, (byte) new_val); } while (0)
> #else
> # define os_atomic_test_and_set_byte(ptr, new_val) \
> __sync_lock_test_and_set(ptr, (byte) new_val)
> #endif
>
> Then we leave stable releases 5.5 and 10.0 unchanged, reducing the risk of
> introducing another serious breakage. And we can still fix PowerPC to work
> correctly.
>
> And then perhaps in 10.1 we can do refactoring of the use of memory barriers
> in InnoDB, if we really want. (Just in case it's not obvious, yes, I _really_
> have the InnoDB mutex code).
>
> Remember, the 5.5 and 10.0 minor releases appear as security updates in
> distros, so it is very important to minimise the risk of introducing serious
> regressions.
I think your idea of fixing this is acceptable. I'll send another patch soon.
>
>
> > Since isync is not a memory barrier and lwsync doesn't guarantee StoreLoad
> > order, the only option we have is full memory barrier.
>
> It's not a problem to have a full barrier here, I think. Because at this
> point, the waiter has already gone through a lot of spin loops, and now it is
> going to take a global mutex on the sync wait array, scan the array, and go to
> sleep with a context switch. One less full barrier is not going to help a lot.
> To improve this contended case of the InnoDB mutexes, more radical changes are
> needed.
Look at the cset comment: every mutex_exit() has to issue full memory barrier
unconditionally!
But I benchmarked this patch and it worked slightly faster on Power8. I suppose
because stale threads don't have to wait a second till wakeup from error monitor
thread.
>
>
> > InnoDB/XtraDB was getting stuck during server startup. In particular (but not
> > limited to) while creating doublewrite buffer.
>
> The stuck during startup is kind of the minor part of the bug. The major part
> is stalling the server during operation. (The hang during startup was just how
> the problem was detected in Buildbot).
>
> Better clarify in the commit message that this fixes a bug that could cause
> the entire server to stall (or even hang completely if error monitor thread
> itself got stuck). This will help people who experienced the problem and look
> at the changelog.
A few quotes from a different thread:
...
Kristian> 2. In a user hang I investigated in detail, a thread was waiting on
Kristian> log_sys->mutex, but no other thread was holding that mutex (consistent with
Kristian> lost wakeup). The server error monitor thread was itself stuck on a mutex,
Kristian> which would prevent sync_arr_wake_threads_if_sema_free() from hiding the
Kristian> problem.
me> Strange... Monty should have fixed this. Error monitor thread should call
me> log_get_lsn_nowait(), which basically does trylock. Do you happen to have call
me> trace?
Kristian> This investigation was some time ago (maybe 1-2 months). It seems likely that
Kristian> this was a version before Monty's log_get_lsn_nowait() (eg. 10.0.14 has it,
Kristian> but not 10.0.13).
...
me> According to history ACQUIRE -> RELEASE fix appeared in 10.0.13 and fix for
me> log_get_lsn() appeared in 10.0.14. Both fixes appeared similtaneously in 5.5.40.
me>
me> So runtime hangs should be solved both in 5.5 and 10.0. This leaves hangs during
me> startup, which are unfortunate but not as critical as runtime hangs.
me>
me> Are there any other known hangs?
Kristian> If you are going to argue for deliberately keeping broken mutex_enter() /
Kristian> mutex_exit() in MariaDB, I will stay out of the discussion. I was only trying
Kristian> to get Buildbot green so that I can push my completely unrelated fixes for
Kristian> 10.0.
...
Stating that this patch fixes run-time hangs that I'm not aware of is kind of
strange.
So I repeat my question: Are there any other known hangs?
>
>
> > It happens because mutex_exit() didn't wake up waiters of affected mutex. This
> > is kind of acceptable because error monitor thread is supposed to wake up such
> > waiters. But since error monitor thread hasn't been started that early, nobody
> > did inform waiting threads that mutex was released.
>
> I do not think the error monitor is "supposed to wake up such waiters". In
> fact, it seems clear that the error monitor wakeup is _not_ used, at least not
> in commonly used binaries. Because it just never worked - it itself needs to
> take mutexes, which will prevent it from doing the wakeup. So if it was
> needed, someone would have noticed this problem. The symptom is _very_ hard to
> miss, as the whole server gets stuck hard, unable to do anything.
>
> It really is not acceptable to have a mutex implementation that can miss
> wakeups. This can stall the entire server. Even if the error monitor resolves
> the stall after 1 second, that is still unacceptable for many applications.
Could you suggest better wording for cset comment?
Thanks,
Sergey
Follow ups
References