← Back to team overview

maria-developers team mailing list archive

Re: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5

 

Hi Kristian,

On Mon, Nov 17, 2014 at 10:58:52AM +0100, Kristian Nielsen wrote:
...skip...

> > me> Strange... Monty should have fixed this. Error monitor thread should call
> > me> log_get_lsn_nowait(), which basically does trylock. Do you happen to have call
> > me> trace?
> 
> > me> According to history ACQUIRE -> RELEASE fix appeared in 10.0.13 and fix for
> > me> log_get_lsn() appeared in 10.0.14. Both fixes appeared similtaneously in 5.5.40.
> > me>
> 
> > Stating that this patch fixes run-time hangs that I'm not aware of is kind of
> > strange.
> >
> > So I repeat my question: Are there any other known hangs?
> 
> Well, there are no hangs that we "know" is caused by this bug. There are hangs
> that we suspect could be caused by this bug.
> 
> Monty's patch, as you say, is not in 10.0.13. And it is insufficient, Jan
> apparently changed another mutex lock to be trylock, which is not in 10.0.14
> or 5.5.40, IIUC. There might be other ways for the error monitor thread to get
> stuck, and even if not there can still be a server stall for 1 second.
> 
> I think you already know all of this, so I'm not sure what answer you are
> looking for from me, sorry...
Ok, that's exactly what I was looking for. Now I see that there are other
mutexes acquired by the error monitor thread. And this is a potential for
other hangs. Thanks for clarifications!

> 
> > Could you suggest better wording for cset comment?
> 
> Here is a suggestion:
> 
> "In MariaDB 5.5.40 and 10.0.13, the InnoDB/XtraDB low-level mutex
> implementation was inadvertently broken, so that a waiter may miss the wakeup
> when another thread releases the mutex. This affects at least x86 and amd64
> architectures. This could result in threads occasionally stalling for about 1
> second, or in some cases even hanging the whole server infinitely."
Ok, this sounds good.

Thanks,
Sergey


References