maria-developers team mailing list archive

Thread
Date

Re: Analysing degraded performance at high concurrency in sysbench OLTP

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Laurynas Biveinis <laurynas.biveinis@xxxxxxxxx>
Date: Tue, 29 Apr 2014 15:04:47 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, Axel Schwenke <axel@xxxxxxxxxxxxxxxx>
In-reply-to: <87mwf44dq1.fsf@frigg.knielsen-hq.org>

Kristian -

Did you test InnoDB or XtraDB?

> Digging further using --call-graph, this turns out to be mostly futex waits
> (and futex wakeups) from inside InnoDB locking primitives. Calls like
> sync_array_get_and_reserve_cell() and sync_array_wait_event() stand out in
> particular.

Interestingly I don't recall it being it a top issue in our benchmarks
(although I was not the one running them, so I could be forgetting
some details), and we did test high concurrency setups. It is possible
we worked around by innodb_sync_array_size and the spinning-related
option tuning.

> So this is related to the non-scalable implementation in InnoDB of locking
> primitives, which is a known problem. I think Mark Callaghan has written about
> it a couple of times. Last time I looked at the code, every single mutex wait
> has to take a global mutex protecting some global arrays and stuff.

The affected waits are those that go to wait on events in the sync
array(s). No global mutex is used if locking is completed through
spinning.

> I even
> remember seeing code that at mutex release would pthread_signal_broadcast()
> _every_ waiter, all of them waking up, only to all (except one) go do another
> wait. This is a kiler for scalability.

We have implemented priority mutex/rwlocks in XtraDB for a different
issue, but it indirectly helps here: allow high priority waiters
waiting on their own designated event. When the mutex/rwlock is
released, signal high-priority waiters only,  There are much fewer
higher priority waiter threads than regular ones.

> Now investigating call-graphs show that the sync_array operations are much
> less visible. Instead mutex_create_func(), called from
> dict_mem_table_create(), is the one that turns up prominently in the profile.
> I am not familiar with what this part of the InnoDB code is doing, but what I
> saw from a quick look is that it creates a mutex - and there is another global
> mutex needed for this, which again limits scalability.
>
> It is a bit surprising to see mutex creation being the most significant
> bottleneck in the benchmark. I would have assumed that most mutexes could be
> created up-front and re-used? It is possible that this is a warm-up thing,
> maybe the code is filling up the buffer pool or some table-cache like thing
> inside InnoDB? Because I see TPS being rather low for the first 150 seconds of
> the run (around 3000), and then increasing suddenly to around 8000-9000 for
> the rest. This might be worth investigating further.

dict_mem_table_create() creating mutexes and rwlocks all the time is a
known issue: http://bugs.mysql.com/bug.php?id=71708. It was here
forever, made worse in Oracle 5.6.16, fully fixed in Percona 5.6.16.
Oracle should have a partial fix in 5.6.19 and full in 5.7.

> I wonder if the InnoDB team @ Oracle is doing something for this in 5.7? Does
> anyone know? I vaguely recall reading something about it, but I am not sure.

5.7 allows different mutex implementations to co-exist, and there is a
new implementation that uses futexes. The sync array implementation is
still there too. The code pushed so far seems to focus on getting the
framework right and adding implementations more than on performance.
I'd expect that to change in the later pushes.

> It would seem a waste to duplicate their efforts.

There are Percona's efforts too ;)

-- 
Laurynas

Follow ups

Re: Analysing degraded performance at high concurrency in sysbench OLTP
From: Kristian Nielsen, 2014-04-30

References

Analysing degraded performance at high concurrency in sysbench OLTP
From: Kristian Nielsen, 2014-04-29