← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5

 

Sergey Vojtovich <svoj@xxxxxxxxxxx> writes:

> Look at the cset comment: every mutex_exit() has to issue full memory barrier
> unconditionally!

Oh, you're right. I mixed up the code paths between mutex_exit() and the other
side (in mutex_spin_wait()).

> me> Strange... Monty should have fixed this. Error monitor thread should call
> me> log_get_lsn_nowait(), which basically does trylock. Do you happen to have call
> me> trace?

> me> According to history ACQUIRE -> RELEASE fix appeared in 10.0.13 and fix for
> me> log_get_lsn() appeared in 10.0.14. Both fixes appeared similtaneously in 5.5.40.
> me>

> Stating that this patch fixes run-time hangs that I'm not aware of is kind of
> strange.
>
> So I repeat my question: Are there any other known hangs?

Well, there are no hangs that we "know" is caused by this bug. There are hangs
that we suspect could be caused by this bug.

Monty's patch, as you say, is not in 10.0.13. And it is insufficient, Jan
apparently changed another mutex lock to be trylock, which is not in 10.0.14
or 5.5.40, IIUC. There might be other ways for the error monitor thread to get
stuck, and even if not there can still be a server stall for 1 second.

I think you already know all of this, so I'm not sure what answer you are
looking for from me, sorry...

> Could you suggest better wording for cset comment?

Here is a suggestion:

"In MariaDB 5.5.40 and 10.0.13, the InnoDB/XtraDB low-level mutex
implementation was inadvertently broken, so that a waiter may miss the wakeup
when another thread releases the mutex. This affects at least x86 and amd64
architectures. This could result in threads occasionally stalling for about 1
second, or in some cases even hanging the whole server infinitely."

Hope this helps,

 - Kristian.


Follow ups

References