maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #06209
Re: MDEV-4956 - Reduce usage of LOCK_open: TABLE_SHARE::tdc.used_tables
Hi Sergei,
On Tue, Sep 10, 2013 at 09:11:16PM +0200, Sergei Golubchik wrote:
> Hi, Sergey!
>
> On Sep 10, Sergey Vojtovich wrote:
> > Hi Sergei,
> >
> > thanks for looking into this patch. Frankly speaking I find it a bit
> > questionable too. Below are links that should answer your questions...
> > What problem do I attempt to solve: https://lists.launchpad.net/maria-developers/msg06118.html
> > How do I attempt to solve it: https://mariadb.atlassian.net/browse/MDEV-4956
>
> Yes, I've seen and remember both, but they don't answer my question,
> which was about specific changes that you've done, not about the goal.
> But ok, see below.
>
> > For every statment we acquire table from table cache and then release table back
> > to the cache. That involves update of 3 lists: unused_tables, per-share
> > used_tables and free_tables. These lists are protected by LOCK_open
> > (see tc_acquire_table() and tc_release_table()).
>
> Why per-share lists are updated under the global mutex?
I would have done that already if it would give us considerable performance gain.
Alas, it doesn't solve CPU cache coherence problem.
Another reason is global unused_tables list, which cannot be protected by
per-share mutex. So we'll have to lock additional mutex twice per query.
The third reason (less important) is that I'm not sure if
TABLE_SHARE::visit_subgraph can be protected by per-share index. I suspect
it needs to see consistent table cache state.
The fourth reason (even less important) is that it is quite complex:
there're invariants between free_tables, unused_tables and in_use.
>
> > Every time we update global pointer, corresponding cache lines of
> > sibling CPUs have to be invalidated. This is causing expensive memory
> > reads while LOCK_open is held.
> >
> > Oracle solved this problem by partitioning table cache, allowing
> > emulation of something like per-CPU lists.
> >
> > We attempted to split LOCK_open logically, and succeeded at everything
> > but these 3 lists. I attempted lock-free list for free_tables, but TPS
> > rate didn't improve.
>
> How did you do the lock-free list, could you show, please?
Please find it attached. It is mixed with different changes, just search for
my_atomic_casptr.
>
> > What we need is to reduce number of these expensive memory reads, and
> > there are two solutions: partition these lists or get rid of them. As
> > we agreed not to partition, I'm trying the latter solution.
>
> Well, you can partition the list. With 32 list head pointers. And a
> thread adding a table only to "this thread's" list. Of course, it's not
> complete partitioning betwen CPUs, as any thread can remove a table from
> any list. But at least there won't be one global list head pointer.
Yes, that's what Oracle did and what we're trying to avoid.
>
> > Why I find this patch questionable? It reduces LOCK_open wait time by
> > 30%, to get close to Oracle wait time, we need to reduce wait time by
> > 90%. We could remove unused_tables as well, but it will be 60% not 90%.
>
> Hmm, if you're only interested in optimizing this specific use case -
> one table, many threads - then yes, may be. But if you have many tables,
> then modifying per-share lists under the share own mutex is, basically,
> a must.
If you mean the oposite situation when N threads accessing N tables
(each thread accessing it's own table): there should be no problem, because
every thread is accessing it's own lists. Well, except for unused_tables of
course, but it can't be protected by per-share mutex anyway (tested a few
months ago, but we could ask XL to double check the above case if needed).
Thanks,
Sergey
=== modified file 'sql/table.h'
--- sql/table.h 2013-08-27 12:12:33 +0000
+++ sql/table.h 2013-09-02 07:45:02 +0000
@@ -477,7 +477,6 @@ TABLE_CATEGORY get_table_category(const
const LEX_STRING *name);
-struct TABLE_share;
struct All_share_tables;
extern ulong tdc_refresh_version(void);
@@ -601,7 +600,6 @@ struct TABLE_SHARE
mysql_mutex_t LOCK_ha_data; /* To protect access to ha_data */
mysql_mutex_t LOCK_share; /* To protect TABLE_SHARE */
- typedef I_P_List <TABLE, TABLE_share> TABLE_list;
typedef I_P_List <TABLE, All_share_tables> All_share_tables_list;
struct
{
@@ -615,12 +613,8 @@ struct TABLE_SHARE
List of tickets representing threads waiting for the share to be flushed.
*/
Wait_for_flush_list m_flush_tickets;
- /*
- Doubly-linked (back-linked) lists of used and unused TABLE objects
- for this share.
- */
- All_share_tables_list all_tables;
- TABLE_list free_tables;
+ All_share_tables_list all_tables; /**< All TABLE objects for this share */
+ TABLE *free_tables; /**< Unused TABLE objects for this share */
} tdc;
LEX_CUSTRING tabledef_version;
@@ -1015,18 +1009,16 @@ struct TABLE
TABLE_SHARE *s;
handler *file;
- TABLE *next, *prev;
+ TABLE *next, *prev, *free_next;
private:
/**
- Links for the lists of used/unused TABLE objects for this share.
+ Links for the list of all TABLE objects for this share.
Declared as private to avoid direct manipulation with those objects.
One should use methods of I_P_List template instead.
*/
- TABLE *share_next, **share_prev;
TABLE *share_all_next, **share_all_prev;
- friend struct TABLE_share;
friend struct All_share_tables;
public:
@@ -1371,22 +1363,9 @@ struct TABLE
/**
Helper class which specifies which members of TABLE are used for
- participation in the list of used/unused TABLE objects for the share.
+ participation in the list of all TABLE objects for the share.
*/
-struct TABLE_share
-{
- static inline TABLE **next_ptr(TABLE *l)
- {
- return &l->share_next;
- }
- static inline TABLE ***prev_ptr(TABLE *l)
- {
- return &l->share_prev;
- }
-};
-
-
struct All_share_tables
{
static inline TABLE **next_ptr(TABLE *l)
=== modified file 'sql/table_cache.cc'
--- sql/table_cache.cc 2013-08-27 12:12:33 +0000
+++ sql/table_cache.cc 2013-09-02 12:16:39 +0000
@@ -152,6 +152,21 @@ uint tc_records(void)
}
+static inline void tc_unlink_table_from_global_unused_list(TABLE *table)
+{
+ mysql_mutex_assert_owner(&LOCK_open);
+ DBUG_ASSERT(!table->in_use);
+ table->next->prev= table->prev;
+ table->prev->next= table->next;
+ if (table == unused_tables)
+ {
+ unused_tables= unused_tables->next;
+ if (table == unused_tables)
+ unused_tables= 0;
+ }
+}
+
+
/**
Free all unused TABLE objects.
@@ -177,7 +192,7 @@ void tc_purge(void)
unused_tables->prev->next= 0;
do
{
- unused_tables->s->tdc.free_tables.remove(unused_tables);
+ unused_tables->s->tdc.free_tables= 0;
unused_tables->s->tdc.all_tables.remove(unused_tables);
tc_count--;
} while ((unused_tables= unused_tables->next));
@@ -228,8 +243,7 @@ static void check_unused(THD *thd)
}
while ((share= tdc_it.next()))
{
- TABLE_SHARE::TABLE_list::Iterator it(share->tdc.free_tables);
- while ((entry= it++))
+ for (entry= share->tdc.free_tables; entry; entry= entry->free_next)
{
/*
We must not have TABLEs in the free list that have their file closed.
@@ -248,8 +262,8 @@ static void check_unused(THD *thd)
count--;
open_files++;
}
- TABLE_SHARE::All_share_tables_list::Iterator it2(share->tdc.all_tables);
- while ((entry= it2++))
+ TABLE_SHARE::All_share_tables_list::Iterator it(share->tdc.all_tables);
+ while ((entry= it++))
{
if (entry->in_use)
open_files++;
@@ -287,18 +301,21 @@ static void tc_remove_table(TABLE *table
mysql_mutex_assert_owner(&LOCK_open);
DBUG_ASSERT(!table->in_use);
/* Remove from per-share chain of unused TABLE objects. */
- table->s->tdc.free_tables.remove(table);
- table->s->tdc.all_tables.remove(table);
-
- /* And global unused chain. */
- table->next->prev= table->prev;
- table->prev->next= table->next;
- if (table == unused_tables)
+ if (table == table->s->tdc.free_tables)
{
- unused_tables= unused_tables->next;
- if (table == unused_tables)
- unused_tables= 0;
+ DBUG_ASSERT(!table->free_next);
+ table->s->tdc.free_tables= 0;
+ }
+ else
+ {
+ TABLE *entry= table->s->tdc.free_tables;
+ for (; entry->free_next->free_next; entry= entry->free_next) /* no-op */;
+ DBUG_ASSERT(entry->free_next == table);
+ entry->free_next= 0;
}
+ table->s->tdc.all_tables.remove(table);
+ /* And global unused chain. */
+ tc_unlink_table_from_global_unused_list(table);
tc_count--;
}
@@ -349,7 +366,7 @@ void tc_add_table(THD *thd, TABLE *table
Acquired object cannot be evicted or acquired again.
While locked:
- - pop object from TABLE_SHARE::tdc.free_tables()
+ - pop object from TABLE_SHARE::tdc.free_tables
- remove object from unused_tables
- mark object used by thd
@@ -360,23 +377,31 @@ static TABLE *tc_acquire_table(THD *thd,
{
TABLE *table;
- mysql_mutex_lock(&LOCK_open);
- if (!(table= share->tdc.free_tables.pop_front()))
+ my_atomic_rwlock_wrlock(&LOCK_tdc_atomics);
+ for (;;)
{
- mysql_mutex_unlock(&LOCK_open);
- return 0;
+ intptr tmp;
+ if (!(tmp= (intptr) my_atomic_loadptr((void **) &share->tdc.free_tables)))
+ {
+ my_atomic_rwlock_wrunlock(&LOCK_tdc_atomics);
+ return 0;
+ }
+ if (tmp & 1)
+ continue;
+ table= (TABLE *) tmp;
+ tmp|= 1;
+ if (!my_atomic_casptr((void **) &share->tdc.free_tables, (void **) &table, (void *) tmp))
+ continue;
+ my_atomic_storeptr((void **) &share->tdc.free_tables, table->free_next);
+ break;
}
+ my_atomic_rwlock_wrunlock(&LOCK_tdc_atomics);
+
+ mysql_mutex_lock(&LOCK_open);
DBUG_ASSERT(!table->in_use);
/* Unlink table from global unused tables list. */
- if (table == unused_tables)
- { // First unused
- unused_tables=unused_tables->next; // Remove from link
- if (table == unused_tables)
- unused_tables=0;
- }
- table->prev->next=table->next; /* Remove from unused list */
- table->next->prev=table->prev;
+ //tc_unlink_table_from_global_unused_list(table);
table->in_use= thd;
mysql_mutex_unlock(&LOCK_open);
@@ -421,22 +446,18 @@ bool tc_release_table(TABLE *table)
DBUG_ASSERT(table->in_use);
DBUG_ASSERT(table->file);
- mysql_mutex_lock(&LOCK_open);
- table->in_use= 0;
- if (table->s->has_old_version() || table->needs_reopen() || !tdc_size ||
- tc_count > tc_size)
+ if (table->needs_reopen())
{
- DBUG_ASSERT(!unused_tables || tc_count <= tc_size);
- tc_count--;
- table->s->tdc.all_tables.remove(table);
- mysql_rwlock_rdlock(&LOCK_flush);
- mysql_mutex_unlock(&LOCK_open);
- intern_close_table(table);
- mysql_rwlock_unlock(&LOCK_flush);
- return true;
+ mysql_mutex_lock(&LOCK_open);
+ table->in_use= 0;
+ goto eviction;
}
- /* Add table to the list of unused TABLE objects for this share. */
- table->s->tdc.free_tables.push_front(table);
+
+ mysql_mutex_lock(&LOCK_open);
+ table->in_use= 0;
+ if (table->s->has_old_version() || tc_count > tc_size)
+ goto eviction;
+#if 0
/* Also link it last in the global list of unused TABLE objects. */
if (unused_tables)
{
@@ -447,9 +468,30 @@ bool tc_release_table(TABLE *table)
}
else
unused_tables=table->next=table->prev=table;
+#endif
mysql_mutex_unlock(&LOCK_open);
+
+ my_atomic_rwlock_wrlock(&LOCK_tdc_atomics);
+ for (table->free_next= 0;
+ !my_atomic_casptr((void **) &table->s->tdc.free_tables, (void **) &table->free_next, table);)
+ {
+ if ((intptr) table->free_next & 1)
+ table->free_next= 0;
+ }
+ my_atomic_rwlock_wrunlock(&LOCK_tdc_atomics);
+
check_unused(thd);
return false;
+
+eviction:
+ DBUG_ASSERT(!unused_tables || tc_count <= tc_size);
+ tc_count--;
+ table->s->tdc.all_tables.remove(table);
+ mysql_rwlock_rdlock(&LOCK_flush);
+ mysql_mutex_unlock(&LOCK_open);
+ intern_close_table(table);
+ mysql_rwlock_unlock(&LOCK_flush);
+ return true;
}
@@ -564,6 +606,7 @@ void tdc_start_shutdown(void)
plugins minimal and allows shutdown to proceed smoothly.
*/
tdc_size= 0;
+ tc_size= 0;
/* Free all cached but unused TABLEs and TABLE_SHAREs. */
close_cached_tables(NULL, NULL, FALSE, LONG_TIMEOUT);
}
@@ -651,7 +694,7 @@ void tdc_init_share(TABLE_SHARE *share)
&share->tdc.LOCK_table_share, MY_MUTEX_INIT_FAST);
share->tdc.m_flush_tickets.empty();
share->tdc.all_tables.empty();
- share->tdc.free_tables.empty();
+ share->tdc.free_tables= 0;
tdc_assign_new_table_id(share);
share->version= tdc_refresh_version();
DBUG_VOID_RETURN;
@@ -668,7 +711,7 @@ void tdc_deinit_share(TABLE_SHARE *share
DBUG_ASSERT(share->tdc.ref_count == 0);
DBUG_ASSERT(share->tdc.m_flush_tickets.is_empty());
DBUG_ASSERT(share->tdc.all_tables.is_empty());
- DBUG_ASSERT(share->tdc.free_tables.is_empty());
+ DBUG_ASSERT(!share->tdc.free_tables);
mysql_mutex_destroy(&share->tdc.LOCK_table_share);
DBUG_VOID_RETURN;
}
@@ -1019,7 +1062,6 @@ bool tdc_remove_table(THD *thd, enum_tdc
const char *db, const char *table_name,
bool kill_delayed_threads)
{
- TABLE *table;
TABLE_SHARE *share;
bool found= false;
DBUG_ENTER("tdc_remove_table");
@@ -1031,14 +1073,12 @@ bool tdc_remove_table(THD *thd, enum_tdc
if ((share= tdc_delete_share(db, table_name)))
{
- I_P_List <TABLE, TABLE_share> purge_tables;
- purge_tables.empty();
+ TABLE *table, *purge_tables;
mysql_mutex_lock(&LOCK_open);
if (kill_delayed_threads)
kill_delayed_threads_for_table(share);
- TABLE_SHARE::TABLE_list::Iterator it(share->tdc.free_tables);
#ifndef DBUG_OFF
if (remove_type == TDC_RT_REMOVE_NOT_OWN)
{
@@ -1058,16 +1098,23 @@ bool tdc_remove_table(THD *thd, enum_tdc
if (remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE)
share->version= 0;
- while ((table= it++))
+ purge_tables= share->tdc.free_tables;
+ share->tdc.free_tables= 0;
+
+ for (table= purge_tables; table; table= table->free_next)
{
- tc_remove_table(table);
- purge_tables.push_front(table);
+ tc_unlink_table_from_global_unused_list(table);
+ tc_count--;
}
+
mysql_rwlock_rdlock(&LOCK_flush);
mysql_mutex_unlock(&LOCK_open);
- while ((table= purge_tables.pop_front()))
+ while ((table= purge_tables))
+ {
+ purge_tables= purge_tables->free_next;
intern_close_table(table);
+ }
mysql_rwlock_unlock(&LOCK_flush);
check_unused(thd);
Follow ups
References