← Back to team overview

maria-developers team mailing list archive

Re: MDEV-4956 - Reduce usage of LOCK_open: TABLE_SHARE::tdc.used_tables

 

Hi Sergei,

just found another interesting test result. I added dummy LOCK_table_share mutex
lock and unlock to tc_acquire_table() and tc_release_table() (before locking
LOCK_open), just to measure pure mutex wait time.

Test execution time: 45s
LOCK_open wait time: 34s
LOCK_table_share wait time: 0.8s

+--------------------------------------------------------+------------+----------------+
| event_name                                             | count_star | sum_timer_wait |
+--------------------------------------------------------+------------+----------------+
| wait/synch/mutex/sql/LOCK_open                         |     585690 | 34298972259258 |
| wait/synch/mutex/mysys/THR_LOCK::mutex                 |     585604 |  4560420039042 |
| wait/synch/mutex/sql/TABLE_SHARE::tdc.LOCK_table_share |     585710 |   794564626359 |
| wait/synch/rwlock/sql/LOCK_tdc                         |     290940 |   237751940139 |
| wait/synch/mutex/sql/THD::LOCK_thd_data                |    1838668 |   219829105251 |
| wait/synch/rwlock/innodb/hash table locks              |     683395 |   159792339294 |
| wait/synch/rwlock/innodb/btr_search_latch              |     290892 |   138915354207 |
| wait/synch/mutex/innodb/trx_sys_mutex                  |      62940 |    78334973451 |
| wait/synch/rwlock/innodb/index_tree_rw_lock            |     167822 |    49323455349 |
| wait/synch/rwlock/sql/MDL_lock::rwlock                 |      41970 |    31436713938 |
+--------------------------------------------------------+------------+----------------+

Regards,
Sergey

On Mon, Sep 16, 2013 at 04:46:41PM +0400, Sergey Vojtovich wrote:
> Hi Sergei,
> 
> On Sat, Sep 14, 2013 at 04:44:28PM +0200, Sergei Golubchik wrote:
> > Hi, Sergey!
> > 
> > On Sep 13, Sergey Vojtovich wrote:
> > > Hi Sergei,
> > > 
> > > comments inline and a question: 10.0 throughput is twice lower than 5.6
> > > in a specific case. It is known to be caused by tc_acquire_table() and
> > > tc_release_table(). Do we want to fix it? If yes - how?
> > 
> > How is it caused by tc_acquire_table/tc_release_table?
> Threads spend a lot of time waiting for LOCK_open in these functions.
> Because protected by LOCK_open code takes a lot of time to execute.
> 
> > In what specific case?
> The case is: many threads access one table (read-only OLTP).
> 
> > 
> > > > > > Why per-share lists are updated under the global mutex?
> > > > > Alas, it doesn't solve CPU cache coherence problem.
> > > > It doesn't solve CPU cache coherence problem, yes.
> > > > And it doesn't help if you have only one hot table.
> > > > But it certainly helps if many threads access many tables.
> > > Ok, let's agree to agree: it will help in certain cases. Most probably it
> > > won't improve situation much if all threads access single table.
> > 
> > Of course.
> > 
> > > We could try to ensure that per-share mutex is on the same cache line as
> > > free_tables and used_tables list heads. In this case I guess
> > > mysql_mutex_lock(&share->tdc.LOCK_table_share) will load list heads into
> > > CPU cache along with mutex structure. OTOH we still have to read per-TABLE
> > > prev/next pointers. And in 5.6 per-partition mutex should less frequently
> > > jump out of CPU cache than our per-share mutex. Worth trying?
> > 
> > Did you benchmark that these cache misses are a problem?
> > What is the main problem that impacts the performance?
> We (Axel and me) did a lot of different benchmarks before we concluded
> cache misses to be the main problem. Please let me known if you're interested
> in specific results - we either find them in benchmark archives or benchmark
> again.
> 
> One of interesting results I just found is as following...
> 10.0.4, read-only OLTP, 64 threads, tps ~10000
> +---------------------------------------------+------------+-----------------+
> | event_name                                  | count_star | sum_timer_wait  |
> +---------------------------------------------+------------+-----------------+
> | wait/synch/mutex/sql/LOCK_open              |    2784632 | 161835901661916 |
> | wait/synch/mutex/mysys/THR_LOCK::mutex      |    2784556 |  28804019775192 |
> ...skip...
> 
> Note that LOCK_open and THR_LOCK::mutex are contested equally, but wait time
> differs ~6x.
> 
> Removing used_tables from tc_acquire_table/tc_release_table makes sum_timer_wait
> go down from 161s to 100s.
> 
> Regards,
> Sergey
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp


Follow ups

References