← Back to team overview

maria-developers team mailing list archive

Re: MDEV-4956 - Reduce usage of LOCK_open: TABLE_SHARE::tdc.used_tables

 

Hi Sergei,

On Tue, Sep 10, 2013 at 09:11:16PM +0200, Sergei Golubchik wrote:
> Hi, Sergey!
> 
> On Sep 10, Sergey Vojtovich wrote:
> > Hi Sergei,
> > 
> > thanks for looking into this patch. Frankly speaking I find it a bit
> > questionable too. Below are links that should answer your questions...
> > What problem do I attempt to solve: https://lists.launchpad.net/maria-developers/msg06118.html
> > How do I attempt to solve it: https://mariadb.atlassian.net/browse/MDEV-4956
> 
> Yes, I've seen and remember both, but they don't answer my question,
> which was about specific changes that you've done, not about the goal.
> But ok, see below.
> 
> > For every statment we acquire table from table cache and then release table back
> > to the cache. That involves update of 3 lists: unused_tables, per-share
> > used_tables and free_tables. These lists are protected by LOCK_open
> > (see tc_acquire_table() and tc_release_table()).
> 
> Why per-share lists are updated under the global mutex?
I would have done that already if it would give us considerable performance gain.
Alas, it doesn't solve CPU cache coherence problem.

Another reason is global unused_tables list, which cannot be protected by
per-share mutex. So we'll have to lock additional mutex twice per query.

The third reason (less important) is that I'm not sure if
TABLE_SHARE::visit_subgraph can be protected by per-share index. I suspect
it needs to see consistent table cache state.

The fourth reason (even less important) is that it is quite complex:
there're invariants between free_tables, unused_tables and in_use.

> 
> > Every time we update global pointer, corresponding cache lines of
> > sibling CPUs have to be invalidated. This is causing expensive memory
> > reads while LOCK_open is held.
> > 
> > Oracle solved this problem by partitioning table cache, allowing
> > emulation of something like per-CPU lists.
> > 
> > We attempted to split LOCK_open logically, and succeeded at everything
> > but these 3 lists. I attempted lock-free list for free_tables, but TPS
> > rate didn't improve.
> 
> How did you do the lock-free list, could you show, please?
Please find it attached. It is mixed with different changes, just search for
my_atomic_casptr.

> 
> > What we need is to reduce number of these expensive memory reads, and
> > there are two solutions: partition these lists or get rid of them. As
> > we agreed not to partition, I'm trying the latter solution.
> 
> Well, you can partition the list. With 32 list head pointers. And a
> thread adding a table only to "this thread's" list. Of course, it's not
> complete partitioning betwen CPUs, as any thread can remove a table from
> any list. But at least there won't be one global list head pointer.
Yes, that's what Oracle did and what we're trying to avoid.

> 
> > Why I find this patch questionable? It reduces LOCK_open wait time by
> > 30%, to get close to Oracle wait time, we need to reduce wait time by
> > 90%. We could remove unused_tables as well, but it will be 60% not 90%.
> 
> Hmm, if you're only interested in optimizing this specific use case -
> one table, many threads - then yes, may be. But if you have many tables,
> then modifying per-share lists under the share own mutex is, basically,
> a must.
If you mean the oposite situation when N threads accessing N tables
(each thread accessing it's own table): there should be no problem, because
every thread is accessing it's own lists. Well, except for unused_tables of
course, but it can't be protected by per-share mutex anyway (tested a few
months ago, but we could ask XL to double check the above case if needed).

Thanks,
Sergey
=== modified file 'sql/table.h'
--- sql/table.h	2013-08-27 12:12:33 +0000
+++ sql/table.h	2013-09-02 07:45:02 +0000
@@ -477,7 +477,6 @@ TABLE_CATEGORY get_table_category(const
                                   const LEX_STRING *name);
 
 
-struct TABLE_share;
 struct All_share_tables;
 
 extern ulong tdc_refresh_version(void);
@@ -601,7 +600,6 @@ struct TABLE_SHARE
   mysql_mutex_t LOCK_ha_data;           /* To protect access to ha_data */
   mysql_mutex_t LOCK_share;             /* To protect TABLE_SHARE */
 
-  typedef I_P_List <TABLE, TABLE_share> TABLE_list;
   typedef I_P_List <TABLE, All_share_tables> All_share_tables_list;
   struct
   {
@@ -615,12 +613,8 @@ struct TABLE_SHARE
       List of tickets representing threads waiting for the share to be flushed.
     */
     Wait_for_flush_list m_flush_tickets;
-    /*
-      Doubly-linked (back-linked) lists of used and unused TABLE objects
-      for this share.
-    */
-    All_share_tables_list all_tables;
-    TABLE_list free_tables;
+    All_share_tables_list all_tables; /**< All TABLE objects for this share */
+    TABLE *free_tables; /**< Unused TABLE objects for this share */
   } tdc;
 
   LEX_CUSTRING tabledef_version;
@@ -1015,18 +1009,16 @@ struct TABLE
 
   TABLE_SHARE	*s;
   handler	*file;
-  TABLE *next, *prev;
+  TABLE *next, *prev, *free_next;
 
 private:
   /**
-     Links for the lists of used/unused TABLE objects for this share.
+     Links for the list of all TABLE objects for this share.
      Declared as private to avoid direct manipulation with those objects.
      One should use methods of I_P_List template instead.
   */
-  TABLE *share_next, **share_prev;
   TABLE *share_all_next, **share_all_prev;
 
-  friend struct TABLE_share;
   friend struct All_share_tables;
 
 public:
@@ -1371,22 +1363,9 @@ struct TABLE
 
 /**
    Helper class which specifies which members of TABLE are used for
-   participation in the list of used/unused TABLE objects for the share.
+   participation in the list of all TABLE objects for the share.
 */
 
-struct TABLE_share
-{
-  static inline TABLE **next_ptr(TABLE *l)
-  {
-    return &l->share_next;
-  }
-  static inline TABLE ***prev_ptr(TABLE *l)
-  {
-    return &l->share_prev;
-  }
-};
-
-
 struct All_share_tables
 {
   static inline TABLE **next_ptr(TABLE *l)

=== modified file 'sql/table_cache.cc'
--- sql/table_cache.cc	2013-08-27 12:12:33 +0000
+++ sql/table_cache.cc	2013-09-02 12:16:39 +0000
@@ -152,6 +152,21 @@ uint tc_records(void)
 }
 
 
+static inline void tc_unlink_table_from_global_unused_list(TABLE *table)
+{
+  mysql_mutex_assert_owner(&LOCK_open);
+  DBUG_ASSERT(!table->in_use);
+  table->next->prev= table->prev;
+  table->prev->next= table->next;
+  if (table == unused_tables)
+  {
+    unused_tables= unused_tables->next;
+    if (table == unused_tables)
+      unused_tables= 0;
+  }
+}
+
+
 /**
   Free all unused TABLE objects.
 
@@ -177,7 +192,7 @@ void tc_purge(void)
     unused_tables->prev->next= 0;
     do
     {
-      unused_tables->s->tdc.free_tables.remove(unused_tables);
+      unused_tables->s->tdc.free_tables= 0;
       unused_tables->s->tdc.all_tables.remove(unused_tables);
       tc_count--;
     } while ((unused_tables= unused_tables->next));
@@ -228,8 +243,7 @@ static void check_unused(THD *thd)
   }
   while ((share= tdc_it.next()))
   {
-    TABLE_SHARE::TABLE_list::Iterator it(share->tdc.free_tables);
-    while ((entry= it++))
+    for (entry= share->tdc.free_tables; entry; entry= entry->free_next)
     {
       /*
         We must not have TABLEs in the free list that have their file closed.
@@ -248,8 +262,8 @@ static void check_unused(THD *thd)
       count--;
       open_files++;
     }
-    TABLE_SHARE::All_share_tables_list::Iterator it2(share->tdc.all_tables);
-    while ((entry= it2++))
+    TABLE_SHARE::All_share_tables_list::Iterator it(share->tdc.all_tables);
+    while ((entry= it++))
     {
       if (entry->in_use)
         open_files++;
@@ -287,18 +301,21 @@ static void tc_remove_table(TABLE *table
   mysql_mutex_assert_owner(&LOCK_open);
   DBUG_ASSERT(!table->in_use);
   /* Remove from per-share chain of unused TABLE objects. */
-  table->s->tdc.free_tables.remove(table);
-  table->s->tdc.all_tables.remove(table);
-
-  /* And global unused chain. */
-  table->next->prev= table->prev;
-  table->prev->next= table->next;
-  if (table == unused_tables)
+  if (table == table->s->tdc.free_tables)
   {
-    unused_tables= unused_tables->next;
-    if (table == unused_tables)
-      unused_tables= 0;
+    DBUG_ASSERT(!table->free_next);
+    table->s->tdc.free_tables= 0;
+  }
+  else
+  {
+    TABLE *entry= table->s->tdc.free_tables;
+    for (; entry->free_next->free_next; entry= entry->free_next) /* no-op */;
+    DBUG_ASSERT(entry->free_next == table);
+    entry->free_next= 0;
   }
+  table->s->tdc.all_tables.remove(table);
+  /* And global unused chain. */
+  tc_unlink_table_from_global_unused_list(table);
   tc_count--;
 }
 
@@ -349,7 +366,7 @@ void tc_add_table(THD *thd, TABLE *table
   Acquired object cannot be evicted or acquired again.
 
   While locked:
-  - pop object from TABLE_SHARE::tdc.free_tables()
+  - pop object from TABLE_SHARE::tdc.free_tables
   - remove object from unused_tables
   - mark object used by thd
 
@@ -360,23 +377,31 @@ static TABLE *tc_acquire_table(THD *thd,
 {
   TABLE *table;
 
-  mysql_mutex_lock(&LOCK_open);
-  if (!(table= share->tdc.free_tables.pop_front()))
+  my_atomic_rwlock_wrlock(&LOCK_tdc_atomics);
+  for (;;)
   {
-    mysql_mutex_unlock(&LOCK_open);
-    return 0;
+    intptr tmp;
+    if (!(tmp= (intptr) my_atomic_loadptr((void **) &share->tdc.free_tables)))
+    {
+      my_atomic_rwlock_wrunlock(&LOCK_tdc_atomics);
+      return 0;
+    }
+    if (tmp & 1)
+      continue;
+    table= (TABLE *) tmp;
+    tmp|= 1;
+    if (!my_atomic_casptr((void **) &share->tdc.free_tables, (void **) &table, (void *) tmp))
+      continue;
+    my_atomic_storeptr((void **) &share->tdc.free_tables, table->free_next);
+    break;
   }
+  my_atomic_rwlock_wrunlock(&LOCK_tdc_atomics);
+
+  mysql_mutex_lock(&LOCK_open);
   DBUG_ASSERT(!table->in_use);
 
   /* Unlink table from global unused tables list. */
-  if (table == unused_tables)
-  {                                             // First unused
-    unused_tables=unused_tables->next;	        // Remove from link
-    if (table == unused_tables)
-      unused_tables=0;
-  }
-  table->prev->next=table->next;		/* Remove from unused list */
-  table->next->prev=table->prev;
+  //tc_unlink_table_from_global_unused_list(table);
   table->in_use= thd;
   mysql_mutex_unlock(&LOCK_open);
 
@@ -421,22 +446,18 @@ bool tc_release_table(TABLE *table)
   DBUG_ASSERT(table->in_use);
   DBUG_ASSERT(table->file);
 
-  mysql_mutex_lock(&LOCK_open);
-  table->in_use= 0;
-  if (table->s->has_old_version() || table->needs_reopen() || !tdc_size ||
-      tc_count > tc_size)
+  if (table->needs_reopen())
   {
-    DBUG_ASSERT(!unused_tables || tc_count <= tc_size);
-    tc_count--;
-    table->s->tdc.all_tables.remove(table);
-    mysql_rwlock_rdlock(&LOCK_flush);
-    mysql_mutex_unlock(&LOCK_open);
-    intern_close_table(table);
-    mysql_rwlock_unlock(&LOCK_flush);
-    return true;
+    mysql_mutex_lock(&LOCK_open);
+    table->in_use= 0;
+    goto eviction;
   }
-  /* Add table to the list of unused TABLE objects for this share. */
-  table->s->tdc.free_tables.push_front(table);
+
+  mysql_mutex_lock(&LOCK_open);
+  table->in_use= 0;
+  if (table->s->has_old_version() || tc_count > tc_size)
+    goto eviction;
+#if 0
   /* Also link it last in the global list of unused TABLE objects. */
   if (unused_tables)
   {
@@ -447,9 +468,30 @@ bool tc_release_table(TABLE *table)
   }
   else
     unused_tables=table->next=table->prev=table;
+#endif
   mysql_mutex_unlock(&LOCK_open);
+
+  my_atomic_rwlock_wrlock(&LOCK_tdc_atomics);
+  for (table->free_next= 0;
+       !my_atomic_casptr((void **) &table->s->tdc.free_tables, (void **) &table->free_next, table);)
+  {
+    if ((intptr) table->free_next & 1)
+      table->free_next= 0;
+  }
+  my_atomic_rwlock_wrunlock(&LOCK_tdc_atomics);
+
   check_unused(thd);
   return false;
+
+eviction:
+  DBUG_ASSERT(!unused_tables || tc_count <= tc_size);
+  tc_count--;
+  table->s->tdc.all_tables.remove(table);
+  mysql_rwlock_rdlock(&LOCK_flush);
+  mysql_mutex_unlock(&LOCK_open);
+  intern_close_table(table);
+  mysql_rwlock_unlock(&LOCK_flush);
+  return true;
 }
 
 
@@ -564,6 +606,7 @@ void tdc_start_shutdown(void)
       plugins minimal and allows shutdown to proceed smoothly.
     */
     tdc_size= 0;
+    tc_size= 0;
     /* Free all cached but unused TABLEs and TABLE_SHAREs. */
     close_cached_tables(NULL, NULL, FALSE, LONG_TIMEOUT);
   }
@@ -651,7 +694,7 @@ void tdc_init_share(TABLE_SHARE *share)
                    &share->tdc.LOCK_table_share, MY_MUTEX_INIT_FAST);
   share->tdc.m_flush_tickets.empty();
   share->tdc.all_tables.empty();
-  share->tdc.free_tables.empty();
+  share->tdc.free_tables= 0;
   tdc_assign_new_table_id(share);
   share->version= tdc_refresh_version();
   DBUG_VOID_RETURN;
@@ -668,7 +711,7 @@ void tdc_deinit_share(TABLE_SHARE *share
   DBUG_ASSERT(share->tdc.ref_count == 0);
   DBUG_ASSERT(share->tdc.m_flush_tickets.is_empty());
   DBUG_ASSERT(share->tdc.all_tables.is_empty());
-  DBUG_ASSERT(share->tdc.free_tables.is_empty());
+  DBUG_ASSERT(!share->tdc.free_tables);
   mysql_mutex_destroy(&share->tdc.LOCK_table_share);
   DBUG_VOID_RETURN;
 }
@@ -1019,7 +1062,6 @@ bool tdc_remove_table(THD *thd, enum_tdc
                       const char *db, const char *table_name,
                       bool kill_delayed_threads)
 {
-  TABLE *table;
   TABLE_SHARE *share;
   bool found= false;
   DBUG_ENTER("tdc_remove_table");
@@ -1031,14 +1073,12 @@ bool tdc_remove_table(THD *thd, enum_tdc
 
   if ((share= tdc_delete_share(db, table_name)))
   {
-    I_P_List <TABLE, TABLE_share> purge_tables;
-    purge_tables.empty();
+    TABLE *table, *purge_tables;
 
     mysql_mutex_lock(&LOCK_open);
     if (kill_delayed_threads)
       kill_delayed_threads_for_table(share);
 
-    TABLE_SHARE::TABLE_list::Iterator it(share->tdc.free_tables);
 #ifndef DBUG_OFF
     if (remove_type == TDC_RT_REMOVE_NOT_OWN)
     {
@@ -1058,16 +1098,23 @@ bool tdc_remove_table(THD *thd, enum_tdc
     if (remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE)
       share->version= 0;
 
-    while ((table= it++))
+    purge_tables= share->tdc.free_tables;
+    share->tdc.free_tables= 0;
+
+    for (table= purge_tables; table; table= table->free_next)
     {
-      tc_remove_table(table);
-      purge_tables.push_front(table);
+      tc_unlink_table_from_global_unused_list(table);
+      tc_count--;
     }
+
     mysql_rwlock_rdlock(&LOCK_flush);
     mysql_mutex_unlock(&LOCK_open);
 
-    while ((table= purge_tables.pop_front()))
+    while ((table= purge_tables))
+    {
+      purge_tables= purge_tables->free_next;
       intern_close_table(table);
+    }
     mysql_rwlock_unlock(&LOCK_flush);
 
     check_unused(thd);


Follow ups

References