maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #06585
Re: MDEV-4956 - Reduce usage of LOCK_open: TABLE_SHARE::tdc.used_tables
Hi Sergei,
below are results of "operf -e LLC_MISSES:10000:0x41 --pid `pidof mysqld`".
Tested with performance schema and general log off. tc_acquire_table() seem
to be inlined, so it appears as tdc_acquire_share().
Also note that my_malloc_size_cb_func() looks like another bottleneck.
Misses summary for tc_acquire_table()/tc_release_table()
--------------------------------------------------------
4.5408 + 3.0090= 7.5498% (10.0)
2.9916 + 2.6908= 5.6824% (10.0 + MDEV4956)
2.3502 + 1.7159= 4.0661% (10.0 + MDEV4956 + no unused_tables)
10.0 (rev.3912)
---------------
CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % image name symbol name
40292 34.8688 no-vmlinux /no-vmlinux
21638 18.7256 libpthread-2.15.so pthread_mutex_lock
6718 5.8138 libpthread-2.15.so pthread_mutex_unlock
5247 4.5408 mysqld tc_release_table(TABLE*)
3477 3.0090 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
3410 2.9510 mysqld TABLE::init(THD*, TABLE_LIST*)
3316 2.8697 mysqld my_malloc_size_cb_func
3126 2.7053 mysqld open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*)
2360 2.0424 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
2254 1.9506 libpthread-2.15.so pthread_rwlock_unlock
2152 1.8623 libpthread-2.15.so __lll_lock_wait
1867 1.6157 mysqld heap_info
1525 1.3197 mysqld tdc_refresh_version()
1441 1.2470 mysqld read_lock_type_for_table(THD*, Query_tables_list*, TABLE_LIST*)
1374 1.1891 mysqld make_join_statistics(JOIN*, List<TABLE_LIST>&, Item*, st_dynamic_array*)
1218 1.0541 mysqld get_lock_data(THD*, TABLE**, unsigned int, unsigned int)
1212 1.0489 mysqld THD::enter_stage(PSI_stage_info_v1 const*, PSI_stage_info_v1*, char const*, char const*, unsigned int)
1062 0.9191 mysqld _my_thread_var
945 0.8178 libc-2.15.so __memset_sse2
875 0.7572 mysqld thr_unlock
724 0.6266 mysqld lock_tables(THD*, TABLE_LIST*, unsigned int, unsigned int)
673 0.5824 mysqld thr_multi_lock
10.0 + MDEV4956
---------------
CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % image name symbol name
43070 35.7909 no-vmlinux /no-vmlinux
22661 18.8311 libpthread-2.15.so pthread_mutex_lock
7960 6.6147 libpthread-2.15.so pthread_mutex_unlock
3600 2.9916 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
3498 2.9068 mysqld my_malloc_size_cb_func
3238 2.6908 mysqld tc_release_table(TABLE*)
3189 2.6500 mysqld TABLE::init(THD*, TABLE_LIST*)
2611 2.1697 libpthread-2.15.so __lll_lock_wait
2414 2.0060 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
2360 1.9611 mysqld open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*)
2250 1.8697 mysqld heap_info
1889 1.5697 mysqld read_lock_type_for_table(THD*, Query_tables_list*, TABLE_LIST*)
1557 1.2939 mysqld make_join_statistics(JOIN*, List<TABLE_LIST>&, Item*, st_dynamic_array*)
1378 1.1451 mysqld THD::enter_stage(PSI_stage_info_v1 const*, PSI_stage_info_v1*, char const*, char const*, unsigned int)
1319 1.0961 mysqld get_lock_data(THD*, TABLE**, unsigned int, unsigned int)
1293 1.0745 mysqld thr_multi_lock
1267 1.0529 libpthread-2.15.so pthread_rwlock_unlock
1211 1.0063 mysqld _my_thread_var
1136 0.9440 libc-2.15.so __memset_sse2
887 0.7371 mysqld thr_unlock
736 0.6116 mysqld free_root
698 0.5800 mysqld tdc_refresh_version()
10.0 + MDEV4956 + no unused_tables
----------------------------------
CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % image name symbol name
41828 35.5653 no-vmlinux /no-vmlinux
22377 19.0266 libpthread-2.15.so pthread_mutex_lock
8219 6.9884 libpthread-2.15.so pthread_mutex_unlock
3578 3.0423 mysqld my_malloc_size_cb_func
3244 2.7583 mysqld TABLE::init(THD*, TABLE_LIST*)
2764 2.3502 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
2630 2.2362 libpthread-2.15.so __lll_lock_wait
2570 2.1852 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
2407 2.0466 mysqld open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*)
2233 1.8987 mysqld heap_info
2027 1.7235 mysqld read_lock_type_for_table(THD*, Query_tables_list*, TABLE_LIST*)
2018 1.7159 mysqld tc_release_table(TABLE*)
1537 1.3069 mysqld make_join_statistics(JOIN*, List<TABLE_LIST>&, Item*, st_dynamic_array*)
1404 1.1938 mysqld THD::enter_stage(PSI_stage_info_v1 const*, PSI_stage_info_v1*, char const*, char const*, unsigned int)
1368 1.1632 mysqld _my_thread_var
1343 1.1419 mysqld thr_multi_lock
1284 1.0918 mysqld get_lock_data(THD*, TABLE**, unsigned int, unsigned int)
1213 1.0314 libc-2.15.so __memset_sse2
1162 0.9880 libpthread-2.15.so pthread_rwlock_unlock
799 0.6794 mysqld thr_unlock
753 0.6403 mysqld tdc_refresh_version()
717 0.6096 mysqld THD::set_open_tables(TABLE*)
Regards,
Sergey
On Mon, Sep 16, 2013 at 08:50:49PM +0400, Sergey Vojtovich wrote:
> Hi Sergei,
>
> just found another interesting test result. I added dummy LOCK_table_share mutex
> lock and unlock to tc_acquire_table() and tc_release_table() (before locking
> LOCK_open), just to measure pure mutex wait time.
>
> Test execution time: 45s
> LOCK_open wait time: 34s
> LOCK_table_share wait time: 0.8s
>
> +--------------------------------------------------------+------------+----------------+
> | event_name | count_star | sum_timer_wait |
> +--------------------------------------------------------+------------+----------------+
> | wait/synch/mutex/sql/LOCK_open | 585690 | 34298972259258 |
> | wait/synch/mutex/mysys/THR_LOCK::mutex | 585604 | 4560420039042 |
> | wait/synch/mutex/sql/TABLE_SHARE::tdc.LOCK_table_share | 585710 | 794564626359 |
> | wait/synch/rwlock/sql/LOCK_tdc | 290940 | 237751940139 |
> | wait/synch/mutex/sql/THD::LOCK_thd_data | 1838668 | 219829105251 |
> | wait/synch/rwlock/innodb/hash table locks | 683395 | 159792339294 |
> | wait/synch/rwlock/innodb/btr_search_latch | 290892 | 138915354207 |
> | wait/synch/mutex/innodb/trx_sys_mutex | 62940 | 78334973451 |
> | wait/synch/rwlock/innodb/index_tree_rw_lock | 167822 | 49323455349 |
> | wait/synch/rwlock/sql/MDL_lock::rwlock | 41970 | 31436713938 |
> +--------------------------------------------------------+------------+----------------+
>
> Regards,
> Sergey
>
> On Mon, Sep 16, 2013 at 04:46:41PM +0400, Sergey Vojtovich wrote:
> > Hi Sergei,
> >
> > On Sat, Sep 14, 2013 at 04:44:28PM +0200, Sergei Golubchik wrote:
> > > Hi, Sergey!
> > >
> > > On Sep 13, Sergey Vojtovich wrote:
> > > > Hi Sergei,
> > > >
> > > > comments inline and a question: 10.0 throughput is twice lower than 5.6
> > > > in a specific case. It is known to be caused by tc_acquire_table() and
> > > > tc_release_table(). Do we want to fix it? If yes - how?
> > >
> > > How is it caused by tc_acquire_table/tc_release_table?
> > Threads spend a lot of time waiting for LOCK_open in these functions.
> > Because protected by LOCK_open code takes a lot of time to execute.
> >
> > > In what specific case?
> > The case is: many threads access one table (read-only OLTP).
> >
> > >
> > > > > > > Why per-share lists are updated under the global mutex?
> > > > > > Alas, it doesn't solve CPU cache coherence problem.
> > > > > It doesn't solve CPU cache coherence problem, yes.
> > > > > And it doesn't help if you have only one hot table.
> > > > > But it certainly helps if many threads access many tables.
> > > > Ok, let's agree to agree: it will help in certain cases. Most probably it
> > > > won't improve situation much if all threads access single table.
> > >
> > > Of course.
> > >
> > > > We could try to ensure that per-share mutex is on the same cache line as
> > > > free_tables and used_tables list heads. In this case I guess
> > > > mysql_mutex_lock(&share->tdc.LOCK_table_share) will load list heads into
> > > > CPU cache along with mutex structure. OTOH we still have to read per-TABLE
> > > > prev/next pointers. And in 5.6 per-partition mutex should less frequently
> > > > jump out of CPU cache than our per-share mutex. Worth trying?
> > >
> > > Did you benchmark that these cache misses are a problem?
> > > What is the main problem that impacts the performance?
> > We (Axel and me) did a lot of different benchmarks before we concluded
> > cache misses to be the main problem. Please let me known if you're interested
> > in specific results - we either find them in benchmark archives or benchmark
> > again.
> >
> > One of interesting results I just found is as following...
> > 10.0.4, read-only OLTP, 64 threads, tps ~10000
> > +---------------------------------------------+------------+-----------------+
> > | event_name | count_star | sum_timer_wait |
> > +---------------------------------------------+------------+-----------------+
> > | wait/synch/mutex/sql/LOCK_open | 2784632 | 161835901661916 |
> > | wait/synch/mutex/mysys/THR_LOCK::mutex | 2784556 | 28804019775192 |
> > ...skip...
> >
> > Note that LOCK_open and THR_LOCK::mutex are contested equally, but wait time
> > differs ~6x.
> >
> > Removing used_tables from tc_acquire_table/tc_release_table makes sum_timer_wait
> > go down from 161s to 100s.
> >
> > Regards,
> > Sergey
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~maria-developers
> > Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~maria-developers
> > More help : https://help.launchpad.net/ListHelp
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help : https://help.launchpad.net/ListHelp
References