maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #06239
Re: MDEV-4956 - Reduce usage of LOCK_open: TABLE_SHARE::tdc.used_tables
Hi Sergei,
On Sat, Sep 14, 2013 at 04:44:28PM +0200, Sergei Golubchik wrote:
> Hi, Sergey!
>
> On Sep 13, Sergey Vojtovich wrote:
> > Hi Sergei,
> >
> > comments inline and a question: 10.0 throughput is twice lower than 5.6
> > in a specific case. It is known to be caused by tc_acquire_table() and
> > tc_release_table(). Do we want to fix it? If yes - how?
>
> How is it caused by tc_acquire_table/tc_release_table?
Threads spend a lot of time waiting for LOCK_open in these functions.
Because protected by LOCK_open code takes a lot of time to execute.
> In what specific case?
The case is: many threads access one table (read-only OLTP).
>
> > > > > Why per-share lists are updated under the global mutex?
> > > > Alas, it doesn't solve CPU cache coherence problem.
> > > It doesn't solve CPU cache coherence problem, yes.
> > > And it doesn't help if you have only one hot table.
> > > But it certainly helps if many threads access many tables.
> > Ok, let's agree to agree: it will help in certain cases. Most probably it
> > won't improve situation much if all threads access single table.
>
> Of course.
>
> > We could try to ensure that per-share mutex is on the same cache line as
> > free_tables and used_tables list heads. In this case I guess
> > mysql_mutex_lock(&share->tdc.LOCK_table_share) will load list heads into
> > CPU cache along with mutex structure. OTOH we still have to read per-TABLE
> > prev/next pointers. And in 5.6 per-partition mutex should less frequently
> > jump out of CPU cache than our per-share mutex. Worth trying?
>
> Did you benchmark that these cache misses are a problem?
> What is the main problem that impacts the performance?
We (Axel and me) did a lot of different benchmarks before we concluded
cache misses to be the main problem. Please let me known if you're interested
in specific results - we either find them in benchmark archives or benchmark
again.
One of interesting results I just found is as following...
10.0.4, read-only OLTP, 64 threads, tps ~10000
+---------------------------------------------+------------+-----------------+
| event_name | count_star | sum_timer_wait |
+---------------------------------------------+------------+-----------------+
| wait/synch/mutex/sql/LOCK_open | 2784632 | 161835901661916 |
| wait/synch/mutex/mysys/THR_LOCK::mutex | 2784556 | 28804019775192 |
...skip...
Note that LOCK_open and THR_LOCK::mutex are contested equally, but wait time
differs ~6x.
Removing used_tables from tc_acquire_table/tc_release_table makes sum_timer_wait
go down from 161s to 100s.
Regards,
Sergey
Follow ups
References