← Back to team overview

maria-developers team mailing list archive

Re: Memory barrier problem in InnoDB/XtraDB mutex implementation causing hangs

 

Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:

>>   # First spin wait, hoping to get the mutex without yielding.
>>   os_compare_and_swap_ulint(&mutex->waiters, 0, 1);
> Where did you see "os_compare_and_swap_ulint(&mutex->waiters, 0, 1)"?
> FWICS waiters of InnoDB mutex are non-atomic loads/stores.

mutex_enter_func() calls mutex_spin_wait() which does
mutex_set_waiters(mutex, 1):

    mutex_set_waiters(mutex_t *mutex, ulint n) {
    #ifdef INNODB_RW_LOCKS_USE_ATOMICS
            if (n) {
                    os_compare_and_swap_ulint(&mutex->waiters, 0, 1);
            } else {
                    os_compare_and_swap_ulint(&mutex->waiters, 1, 0);
            }

So that is where I saw it. I do not understand what you mean with
"waiters of InnoDB mutex are non-atomic loads/stores" ?

>> os_compare_and_swap_ulint() is defined as either
>> __sync_bool_compare_and_swap() or atomic_cas_ulong(), both of which implies a
>> full barrier. So ENTER looks ok.
> See above: waiters loads/stores are not atomic, so no memory barriers at all.

Well, the code is hopelessly complicated with various #ifdef cases. I do not
claim to have verified all cases that they have the correct barrier for ENTER.

However, I found that in my particular build, mutex_set_waiters used the
INNODB_RW_LOCKS_USE_ATOMICS case, which uses os_compare_and_swap_ulint().
And that is defined in os0sync.h as either __sync_bool_compare_and_swap(),
which is documented as having a full barrier; or atomic_cas_ulong(), which I
think is the MySQL's own atomics, which on x86 were implemented using LOCK
prefix last time I looked, so also full barrier.

Why do you think there is no full barrier? Did I miss something?

>> This is consistent with the observations in MDEV-7026. A thread is waiting for
>> wakeup on a mutex that is already marked free.
> Ok, so you say this hang is now repeatable with x86 because we changed
> ACQUIRE -> RELEASE. I assume you were actually able to reproduce it on x86.

I have not tried reproducing on x86.

Note that on x86 we changed it from full barrier (XCHG instruction) to aquire
(which does nothing if my test program is to be believed). The documentation
only guarantees AQUIRE for __sync_lock_test_and_set(), true, but in reality it
seems to have been full barrier on x86, which would explain why it worked
before.

I am speculating that some of the hangs that started to appear recently could
be due to this change. At least one case, that I researched in considerable
depth, can be explained adequately in this way. It seems likely that this is
the explanation, but I cannot be sure.

> Note that we also added mysterious os_isync to mutex_exit_func, which is
> supposed to be among other things ACQUIRE. But on x86 it is no-op with a
> comment:
> /* Performance regression was observed at some conditions for Intel
> architecture. Disable memory barrier for Intel architecture for now. */

I saw the added isync. I think this is also wrong. isync is about speculative
execution, one can use it with a conditional instruction to prevent
instructions being executed ahead of the completion of a load. But the place I
saw, it was used incorrectly, without the conditional instruction. I did not
dig deeper into this though, as even if used correctly it would still not be
the full barrier that seems to be required here.

> Does it explain server lockups at startup or during normal operations? Could you
> elaborate the problem you're talking about?

So my guess is that on x86, the full barriers are in place, and the mutex code
works without losing wakeups. However, at some point there was some problem,
maybe some specific compiler or architecture or whatever, and the
sync_arr_wake_threads_if_sema_free() hack was implemented. I am guessing that
things work correctly on x86 because the sync_arr_wake_threads_if_sema_free()
is fundamentally broken. It is not active during startup (and maybe
shutdown?). And it does not work if the server error monitor thread happens to
also get stuck on a mutex. So I think missing barriers on x86 would have been
discovered in commonly used binaries. And the
sync_arr_wake_threads_if_sema_free() has been able to hide problems in rare
architectures. That is my guess.

As to the concrete problems:

1. MDEV-7026 there is a lockup during startup. I believe this is visible
because the server error monitor thread is not running at this point.

2. In a user hang I investigated in detail, a thread was waiting on
log_sys->mutex, but no other thread was holding that mutex (consistent with
lost wakeup). The server error monitor thread was itself stuck on a mutex,
which would prevent sync_arr_wake_threads_if_sema_free() from hiding the
problem.

> with Power8. I'd vote for putting full memory barrier into mutex_exit_func()
> and probably remove that ugly hack with sync_arr_wake_threads_if_sema_free. :(

Yes, I agree, that is what I meant.

I agree that it was probably broken on PowerPC also before the change. But it
looked to me like we had full barriers on x86 before the change (even if it
was not guaranteed by documentation).

> Though a proper solution should never issue full memory barriers for mutex
> lock/unlock. I suggested different idea, but that's for 10.1:

This sounds very strange. Are you sure?
Normally, mutex lock/unlock is assumed to be a full barrier. If my reading of
the code is correct, this is also how it was implemented on x86. It seems
_very_ likely that part of the InnoDB code relies on such full barriers. It
does not seem safe to me to change it.

Posix thread operations are for example defined to imply full barrier.

Or do you mean that there should be _one_ full barrier implied in
mutex_enter(), and one in mutex_exit()? But there are currently more than one,
and this should be optimised? That would make sense. Though my personal
opinion is that the InnoDB mutex implementation has always been a piece of
garbage, and rather than mess more with it, it should be replaced with
something simpler and better (isn't this what the InnoDB team is already
working on?).

 - Kristian.


Follow ups

References