maria-developers team mailing list archive

Thread
Date
Re: MDEV-7004 review

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Sergey Vojtovich <svoj@xxxxxxxxxxx>
Date: Thu, 27 Nov 2014 00:09:27 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20141126190044.GA6154@meddwl.fritz.box>
User-agent: Mutt/1.5.21 (2010-09-15)
Hi Sergey,

thanks for looking into this!

On Wed, Nov 26, 2014 at 08:00:44PM +0100, Sergei Golubchik wrote:
> Hi, Sergey
> 
> I didn't review MDEV-7200 (InnoDB) and patches that are already in 10.1.
Patch for MDEV-7200 is not good to go into 10.1 as is anyway. It is here just as
a reminder that we can gain another 2% with simple fix.

Some of my patches depend on Monty's patches that haven't actually been pushed
to 10.1 yet. I'll hold those.

Further comments inline.

> Almost everything else was fine, see few comments below.
> 
> > ------------------------------------------------------------
> > revno: 4513
> > committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > branch nick: 10.0-power
> > timestamp: Fri 2014-11-07 22:03:18 +0400
> > message:
> >   Multi-rwlock lock for table definition cache hash.
> > added:
> > modified:
> > diff:
> > === modified file 'sql/table_cache.cc'
> > --- sql/table_cache.cc	2014-10-21 12:20:39 +0000
> > +++ sql/table_cache.cc	2014-11-07 18:03:18 +0000
> > @@ -53,6 +53,77 @@
> >  #include "table.h"
> >  #include "sql_base.h"
> >  
> > +
> > +/**
> > +  Table definition hash lock.
> > +
> > +  Lock consists of multiple rw-locks (8 by default). Lock is write locked
> > +  when all instances are write locked. Lock is read locked when any instance
> > +  is read locked.
> > +
> > +  This class was implemented to workaround scalability bottleneck of table
> > +  definition hash rw-lock.
> > +
> > +  To be replaced with lock-free hash when we understand how to solve the
> > +  following problems:
> > +  - lock-free hash iterator (can be workarounded by introducing global
> > +    all_tables list);
> 
> 
> I don't particularly like that solution.
> Could you try to switch to lock-free hash now?
> 
> I'll try to do the iterator asap
This is the change that allowed us to break through 27kTPS to 33kTPS. I don't
like it either. My raw lf-hash prototype shows similar results.

I think we hardly need to have this bottleneck solved in 10.1 one way or
another. So my question is: do you feel like we will be able to implement this
all on time for 10.1?

> 
> > +  - lf_hash_search() may fail because of OOM. It is rather complex to handle
> > +    it properly.
> > +*/
> > +
> > +class TDC_lock
> > +{
> > +  uint m_instances;
> > +  bool m_write_locked;
> > +  char pad[128];
Interesting, why did I add the above padding... probably leftover from
intermediate patch. Will remove it.

> > +  struct st_locks
> > +  {
> > +    mysql_rwlock_t lock;
> > +    char pad[128];
> 
> In these cases I usually do: pad[128 - sizeof(mysql_rwlock_t)];
Hmm... strange that I didn't define macro for this, I thought I did.

Your suggestion guarantees that st_locks will be at least 128 bytes. But these
128 bytes may be split across 2 cache lines. Or malloc returns pointer aligned
on cache line size?

> 
> > +  } *locks;
> > +public:
> > +  int init(PSI_rwlock_key key, uint instances)
> > +  {
> > +    m_instances= instances;
> > +    m_write_locked= false;
> > +    locks= (st_locks *) my_malloc(sizeof(struct st_locks) * m_instances, MYF(MY_WME));
> > +    if (!locks)
> > +      return 1;
> > +    for (uint i= 0; i < m_instances; i++)
> > +      mysql_rwlock_init(key, &locks[i].lock);
> > +    return 0;
> > +  }
> > +  void destroy(void)
> > +  {
> > +    for (uint i= 0; i < m_instances; i++)
> > +      mysql_rwlock_destroy(&locks[i].lock);
> > +    my_free(locks);
> > +  }
> > +  void wrlock(void)
> > +  {
> > +    for (uint i= 0; i < m_instances; i++)
> > +      mysql_rwlock_wrlock(&locks[i].lock);
> > +    m_write_locked= true;
> > +  }
> > +  void rdlock(THD *thd= 0)
> > +  {
> > +    mysql_rwlock_rdlock(&locks[thd ? thd->thread_id % m_instances : 0].lock);
> > +  }
> > +  void unlock(THD *thd= 0)
> > +  {
> > +    if (m_write_locked)
> > +    {
> > +      m_write_locked= false;
> > +      for (uint i= 0; i < m_instances; i++)
> > +        mysql_rwlock_unlock(&locks[i].lock);
> > +    }
> > +    else
> > +      mysql_rwlock_unlock(&locks[thd ? thd->thread_id % m_instances : 0].lock);
> > +  }
> > +};
> > +
> > +
> >  /** Configuration. */
> >  ulong tdc_size; /**< Table definition cache threshold for LRU eviction. */
> >  ulong tc_size; /**< Table cache threshold for LRU eviction. */
> > ------------------------------------------------------------
> > revno: 4512
> > committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > branch nick: 10.0-power
> > timestamp: Fri 2014-10-31 15:44:36 +0400
> > message:
> >   Preallocate locks on THD mem_root to avoid expensive malloc.
> > modified:
> > diff:
> > === modified file 'sql/lock.cc'
> > --- sql/lock.cc	2014-09-30 17:31:14 +0000
> > +++ sql/lock.cc	2014-10-31 11:44:36 +0000
> > @@ -700,7 +699,7 @@ MYSQL_LOCK *get_lock_data(THD *thd, TABL
> >    TABLE **to, **table_buf;
> >    DBUG_ENTER("get_lock_data");
> >  
> > -  DBUG_ASSERT((flags == GET_LOCK_UNLOCK) || (flags == GET_LOCK_STORE_LOCKS));
> > +  DBUG_ASSERT((flags & GET_LOCK_UNLOCK) || (flags & GET_LOCK_STORE_LOCKS));
> 
> Not the same, old assert also verified that these flags aren't set both.
> New assert doesn't.
> But I don't think it matters much.
You're right, I'll change this: the stricter - the better.

> 
> >    DBUG_PRINT("info", ("count %d", count));
> >  
> >    for (i=lock_count=table_count=0 ; i < count ; i++)
> > ------------------------------------------------------------
> > revno: 4508
> > committer: Sergey Vojtovich <svoj@xxxxxxxxxxx>
> > branch nick: 10.0
> > timestamp: Tue 2014-10-21 16:20:39 +0400
> > message:
> > modified:
> > diff:
> > === modified file 'sql/handler.cc'
> > --- sql/handler.cc	2014-11-13 10:01:31 +0000
> > +++ sql/handler.cc	2014-10-21 12:20:39 +0000
> > @@ -410,13 +410,16 @@ static int hton_ext_based_table_discover
> >  static void update_discovery_counters(handlerton *hton, int val)
> >  {
> >    if (hton->discover_table_existence == full_discover_for_existence)
> > -    my_atomic_add32(&need_full_discover_for_existence,  val);
> > +    my_atomic_add32_explicit(&need_full_discover_for_existence, val,
> > +                             MY_MEMORY_ORDER_RELAXED);
> >  
> >    if (hton->discover_table_names)
> > -    my_atomic_add32(&engines_with_discover_table_names, val);
> > +    my_atomic_add32_explicit(&engines_with_discover_table_names, val,
> > +                             MY_MEMORY_ORDER_RELAXED);
> >  
> >    if (hton->discover_table)
> > -    my_atomic_add32(&engines_with_discover, val);
> > +    my_atomic_add32_explicit(&engines_with_discover, val,
> > +                             MY_MEMORY_ORDER_RELAXED);
> 
> I wouldn't do that. Technically there's a dependency between these counters
> and loaded engines, so the order, kind of, matters. I don't think there
> is any practical risk of bugs here, but, on the other hand, these
> counters are only updated when an engine is loaded or unloaded,
> so there's no need to bother with the relaxed memory ordering here.
Ok, I'll revert this.

> 
> >  }
> >  
> >  int ha_finalize_handlerton(st_plugin_int *plugin)
> > 
> > === modified file 'sql/mysqld.cc'
> > --- sql/mysqld.cc	2014-11-18 21:25:47 +0000
> > +++ sql/mysqld.cc	2014-10-21 12:20:39 +0000
> > @@ -3884,7 +3884,7 @@ static void my_malloc_size_cb_func(long
> >    }
> >    // workaround for gcc 4.2.4-1ubuntu4 -fPIE (from DEB_BUILD_HARDENING=1)
> >    int64 volatile * volatile ptr=&global_status_var.memory_used;
> > -  my_atomic_add64(ptr, size);
> > +  my_atomic_add64_explicit(ptr, size, MY_MEMORY_ORDER_RELAXED);
> 
> good!
> 
> >  }
> >  }
> >  
> > 
> > === modified file 'sql/mysqld.h'
> > --- sql/mysqld.h	2014-11-13 08:56:28 +0000
> > +++ sql/mysqld.h	2014-10-21 12:20:39 +0000
> > @@ -629,7 +629,7 @@ inline __attribute__((warn_unused_result
> >  {
> >    query_id_t id;
> >    my_atomic_rwlock_wrlock(&global_query_id_lock);
> > -  id= my_atomic_add64(&global_query_id, 1);
> > +  id= my_atomic_add64_explicit(&global_query_id, 1, MY_MEMORY_ORDER_RELAXED);
> 
> I believe this is fine
> 
> >    my_atomic_rwlock_wrunlock(&global_query_id_lock);
> >    return (id);
> >  }
> > === modified file 'sql/table_cache.cc'
> > --- sql/table_cache.cc	2014-06-10 18:20:33 +0000
> > +++ sql/table_cache.cc	2014-10-21 12:20:39 +0000
> > @@ -139,7 +139,7 @@ uint tc_records(void)
> >  {
> >    uint count;
> >    my_atomic_rwlock_rdlock(&LOCK_tdc_atomics);
> > -  count= my_atomic_load32(&tc_count);
> > +  count= my_atomic_load32_explicit(&tc_count, MY_MEMORY_ORDER_RELAXED);
> 
> I'm not sure about this file. You should know better whether
> it's safe to use relaxed memory ordering here.
It should be Ok. This counter should not have strict memory ordering
constraints.

Thanks,
Sergey
Follow ups

Re: MDEV-7004 review
From: Sergei Golubchik, 2014-11-27
References

MDEV-7004 review
From: Sergei Golubchik, 2014-11-26