← Back to team overview

maria-developers team mailing list archive

Re: Memory barrier problem in InnoDB/XtraDB mutex implementation causing hangs



On Thu, Nov 06, 2014 at 09:23:42PM +0100, Kristian Nielsen wrote:

> > Strange... Monty should have fixed this. Error monitor thread should call
> > log_get_lsn_nowait(), which basically does trylock. Do you happen to have call
> > trace?
> This investigation was some time ago (maybe 1-2 months). It seems likely that
> this was a version before Monty's log_get_lsn_nowait() (eg. 10.0.14 has it,
> but not 10.0.13).
According to history ACQUIRE -> RELEASE fix appeared in 10.0.13 and fix for
log_get_lsn() appeared in 10.0.14. Both fixes appeared similtaneously in 5.5.40.

So runtime hangs should be solved both in 5.5 and 10.0. This leaves hangs during
startup, which are unfortunate but not as critical as runtime hangs.

Are there any other known hangs?

> > lock/unlock? Probably that was quite outdated writing. See e.g.:
> Outdated, perhaps, but aquire/release is AFAIK a relatively new concept.
> Mutexes, and InnoDB source code predates that? I'm not sure that C++ coming up
> with some strange new semantics for their language has much bearing on legacy
> code?
> But I agree it would be nice to have some references about "old-style" mutexes
> implying full memory barrier, so that it's not just me...
Yes, acquire/release/etc is relatively new concept. For x86 this probably makes
little sense. But at least on Power8:
- pthread_mutex_lock() issues "isync" (confirms to acquire semantics)
- pthread_mutex_unlock() issues "lwsync" (confirms to release semantics)
- sync builtins issue "sync" (confirms to seq_cst semantics)

> > http://en.cppreference.com/w/cpp/atomic/memory_order
> >
> > ...Atomic load with memory_order_acquire or stronger is an acquire operation.
> > The lock() operation on a Mutex is also an acquire operation...
> >
> > ...Atomic store with memory_order_release or stronger is a release operation.
> > The unlock() operation on a Mutex is also a release operation...
> Interesting... so C++ defines a "Mutex" with different semantics than what is
> usually understood with eg. pthread_mutex...
> > Full memory barriers are way too heavy even for mutexes. All we need to to is
> > to ensure that:
> > - all loads following lock are performed after lock (acquire)
> > - all stores preceding unlock are completed before unlock (release)
> Are you sure?
> Note that if this is true, then it means that there is _no_ way to get a
> StoreLoad barrier using only normal mutex operations. That seems odd...
> I know I have seen, and written, code that depends on lock/unlock being full
> barrier. How can we be sure that such code doesn't also exist in InnoDB?
> Though I agree that full barrier is a lot more expensive than just LoadLoad or
> StoreStore, and best avoided if possible (I even blogged about that not so
> long ago).
That's how I read it. So there is no guarantee that global_var1 will be stored
before global_var2 is loaded:
global_var1= 1;
local_var= global_var2;

Even more interesting: it has concept of "affected memory location" bound to
memory barrier:
memory_order_acquire: A load operation with this memory order performs the
acquire operation on the affected memory location: prior writes made to other
memory locations by the thread that did the release become visible in this

memory_order_release: A store operation with this memory order performs the
release operation: prior writes to other memory locations become visible to
the threads that do a consume or an acquire on the same location.

I read it as "release" on one memory location won't neccessarily make stores
visible to "acquire" on a different location.


Follow ups