← Back to team overview

maria-developers team mailing list archive

Re: [Commits] 673e2537249: Fix missing memory barrier in wait_for_commit

 

Hi Andrei,

> > I noticed another thing with the wait_for_commit error handling while
> > looking at the MDEV-18648 patch.

> So practically, we seem have to turn the types of the pair into
> std::atomic<> to store into, in the right order, and load from (the way
> it is).

Yes. Something like this? There is a bit of noise in the patch due to
changing the type of waitee to atomic and adjusting all code that accesses
it. But it's basically this in wakeup:

  this->wakeup_error= wakeup_error;
  waitee.store(NULL, std::memory_order_release);

and this in the waiter fast path (one place in sql_class.h and one in log.cc):

    if (waitee.load(std::memory_order_acquire))
      return wait_for_prior_commit2(thd);
    else
      return wakeup_error;

Other access are protected by lock or happen within the waiting thread only,
so they can just use the memory_order_relaxed access semantics without
barriers.

Unfortunately I don't see how to do a useful test case. It is not really
possible to force load or store reordering (on x86 I believe the CPU in fact
enforces orderring between loads and stores). I couldn't think of a test
that would fail without the patch and succeed with it.

If you agree with the patch, do you have an opinion on which version it
should go into? I'd suggest 10.4, since (IIUC) that version started the use
of std::atomic.

(Patch also available here:
https://github.com/knielsen/server/commit/673e253724979fd9fe43a4a22bd7e1b2c3a5269e).

 - Kristian.


Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

> revision-id: 673e253724979fd9fe43a4a22bd7e1b2c3a5269e (mariadb-10.4.4-333-g673e2537249)
> parent(s): 8887effe13ad87ba0460d4d3068fb5696f089bb0
> author: Kristian Nielsen
> committer: Kristian Nielsen
> timestamp: 2019-09-26 17:43:26 +0200
> message:
>
> Fix missing memory barrier in wait_for_commit
>
> The function wait_for_commit::wait_for_prior_commit() has a fast path
> where it checks without locks if wakeup_subsequent_commits() has
> already been called. This check was missing a memory barrier. The
> waitee thread does two writes to variables `waitee' and
> `wakeup_error', and if the waiting thread sees the first write it
> _must_ also see the second or incorrect behaviour will occur. This
> requires memory barriers between both the writes (release semantics)
> and the reads (acquire semantics) of those two variables.
>
> Other accesses to these variables are done under lock or where only
> one thread will be accessing them, and can be done without barriers
> (relaxed sematics).
>
> ---
>  sql/log.cc       | 19 +++++++++++++------
>  sql/sql_class.cc | 25 ++++++++++++++-----------
>  sql/sql_class.h  | 17 ++++++++++++-----
>  3 files changed, 39 insertions(+), 22 deletions(-)
>
> diff --git a/sql/log.cc b/sql/log.cc
> index 4f51a9a9c17..a88d5147898 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -7467,8 +7467,10 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
>    */
>    wfc= orig_entry->thd->wait_for_commit_ptr;
>    orig_entry->queued_by_other= false;
> -  if (wfc && wfc->waitee)
> +  if (wfc && wfc->waitee.load(std::memory_order_acquire))
>    {
> +    wait_for_commit *loc_waitee;
> +
>      mysql_mutex_lock(&wfc->LOCK_wait_commit);
>      /*
>        Do an extra check here, this time safely under lock.
> @@ -7480,10 +7482,10 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
>        before setting the flag, so there is no risk that we can queue ahead of
>        it.
>      */
> -    if (wfc->waitee && !wfc->waitee->commit_started)
> +    if ((loc_waitee= wfc->waitee.load(std::memory_order_relaxed)) &&
> +        !loc_waitee->commit_started)
>      {
>        PSI_stage_info old_stage;
> -      wait_for_commit *loc_waitee;
>  
>        /*
>          By setting wfc->opaque_pointer to our own entry, we mark that we are
> @@ -7505,7 +7507,8 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
>                                    &wfc->LOCK_wait_commit,
>                                    &stage_waiting_for_prior_transaction_to_commit,
>                                    &old_stage);
> -      while ((loc_waitee= wfc->waitee) && !orig_entry->thd->check_killed(1))
> +      while ((loc_waitee= wfc->waitee.load(std::memory_order_relaxed)) &&
> +              !orig_entry->thd->check_killed(1))
>          mysql_cond_wait(&wfc->COND_wait_commit, &wfc->LOCK_wait_commit);
>        wfc->opaque_pointer= NULL;
>        DBUG_PRINT("info", ("After waiting for prior commit, queued_by_other=%d",
> @@ -7523,14 +7526,18 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
>            do
>            {
>              mysql_cond_wait(&wfc->COND_wait_commit, &wfc->LOCK_wait_commit);
> -          } while (wfc->waitee);
> +          } while (wfc->waitee.load(std::memory_order_relaxed));
>          }
>          else
>          {
>            /* We were killed, so remove us from the list of waitee. */
>            wfc->remove_from_list(&loc_waitee->subsequent_commits_list);
>            mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit);
> -          wfc->waitee= NULL;
> +          /*
> +            This is the thread clearing its own status, it is no longer on
> +            the list of waiters. So no memory barriers are needed here.
> +          */
> +          wfc->waitee.store(NULL, std::memory_order_relaxed);
>  
>            orig_entry->thd->EXIT_COND(&old_stage);
>            /* Interrupted by kill. */
> diff --git a/sql/sql_class.cc b/sql/sql_class.cc
> index 4eab241232b..ca179a39dc1 100644
> --- a/sql/sql_class.cc
> +++ b/sql/sql_class.cc
> @@ -7230,7 +7230,7 @@ wait_for_commit::reinit()
>  {
>    subsequent_commits_list= NULL;
>    next_subsequent_commit= NULL;
> -  waitee= NULL;
> +  waitee.store(NULL, std::memory_order_relaxed);
>    opaque_pointer= NULL;
>    wakeup_error= 0;
>    wakeup_subsequent_commits_running= false;
> @@ -7308,8 +7308,9 @@ wait_for_commit::wakeup(int wakeup_error)
>  
>    */
>    mysql_mutex_lock(&LOCK_wait_commit);
> -  waitee= NULL;
>    this->wakeup_error= wakeup_error;
> +  /* Memory barrier to make wakeup_error visible to the waiter thread. */
> +  waitee.store(NULL, std::memory_order_release);
>    /*
>      Note that it is critical that the mysql_cond_signal() here is done while
>      still holding the mutex. As soon as we release the mutex, the waiter might
> @@ -7340,9 +7341,10 @@ wait_for_commit::wakeup(int wakeup_error)
>  void
>  wait_for_commit::register_wait_for_prior_commit(wait_for_commit *waitee)
>  {
> -  DBUG_ASSERT(!this->waitee /* No prior registration allowed */);
> +  DBUG_ASSERT(!this->waitee.load(std::memory_order_relaxed)
> +              /* No prior registration allowed */);
>    wakeup_error= 0;
> -  this->waitee= waitee;
> +  this->waitee.store(waitee, std::memory_order_relaxed);
>  
>    mysql_mutex_lock(&waitee->LOCK_wait_commit);
>    /*
> @@ -7351,7 +7353,7 @@ wait_for_commit::register_wait_for_prior_commit(wait_for_commit *waitee)
>      see comments on wakeup_subsequent_commits2() for details.
>    */
>    if (waitee->wakeup_subsequent_commits_running)
> -    this->waitee= NULL;
> +    this->waitee.store(NULL, std::memory_order_relaxed);
>    else
>    {
>      /*
> @@ -7381,7 +7383,8 @@ wait_for_commit::wait_for_prior_commit2(THD *thd)
>    thd->ENTER_COND(&COND_wait_commit, &LOCK_wait_commit,
>                    &stage_waiting_for_prior_transaction_to_commit,
>                    &old_stage);
> -  while ((loc_waitee= this->waitee) && likely(!thd->check_killed(1)))
> +  while ((loc_waitee= this->waitee.load(std::memory_order_relaxed)) &&
> +         likely(!thd->check_killed(1)))
>      mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit);
>    if (!loc_waitee)
>    {
> @@ -7404,14 +7407,14 @@ wait_for_commit::wait_for_prior_commit2(THD *thd)
>      do
>      {
>        mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit);
> -    } while (this->waitee);
> +    } while (this->waitee.load(std::memory_order_relaxed));
>      if (wakeup_error)
>        my_error(ER_PRIOR_COMMIT_FAILED, MYF(0));
>      goto end;
>    }
>    remove_from_list(&loc_waitee->subsequent_commits_list);
>    mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit);
> -  this->waitee= NULL;
> +  this->waitee.store(NULL, std::memory_order_relaxed);
>  
>    wakeup_error= thd->killed_errno();
>    if (!wakeup_error)
> @@ -7513,7 +7516,7 @@ wait_for_commit::unregister_wait_for_prior_commit2()
>    wait_for_commit *loc_waitee;
>  
>    mysql_mutex_lock(&LOCK_wait_commit);
> -  if ((loc_waitee= this->waitee))
> +  if ((loc_waitee= this->waitee.load(std::memory_order_relaxed)))
>    {
>      mysql_mutex_lock(&loc_waitee->LOCK_wait_commit);
>      if (loc_waitee->wakeup_subsequent_commits_running)
> @@ -7526,7 +7529,7 @@ wait_for_commit::unregister_wait_for_prior_commit2()
>          See comments on wakeup_subsequent_commits2() for more details.
>        */
>        mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit);
> -      while (this->waitee)
> +      while (this->waitee.load(std::memory_order_relaxed))
>          mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit);
>      }
>      else
> @@ -7534,7 +7537,7 @@ wait_for_commit::unregister_wait_for_prior_commit2()
>        /* Remove ourselves from the list in the waitee. */
>        remove_from_list(&loc_waitee->subsequent_commits_list);
>        mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit);
> -      this->waitee= NULL;
> +      this->waitee.store(NULL, std::memory_order_relaxed);
>      }
>    }
>    wakeup_error= 0;
> diff --git a/sql/sql_class.h b/sql/sql_class.h
> index 72cb8148895..1c81739865b 100644
> --- a/sql/sql_class.h
> +++ b/sql/sql_class.h
> @@ -20,6 +20,7 @@
>  
>  /* Classes in mysql */
>  
> +#include <atomic>
>  #include "dur_prop.h"
>  #include <waiting_threads.h>
>  #include "sql_const.h"
> @@ -2018,7 +2019,7 @@ struct wait_for_commit
>    /*
>      The LOCK_wait_commit protects the fields subsequent_commits_list and
>      wakeup_subsequent_commits_running (for a waitee), and the pointer
> -    waiterr and associated COND_wait_commit (for a waiter).
> +    waitee and associated COND_wait_commit (for a waiter).
>    */
>    mysql_mutex_t LOCK_wait_commit;
>    mysql_cond_t COND_wait_commit;
> @@ -2032,8 +2033,14 @@ struct wait_for_commit
>  
>      When this is cleared for wakeup, the COND_wait_commit condition is
>      signalled.
> +
> +    This pointer is protected by LOCK_wait_commit. But there is also a "fast
> +    path" where the waiter compares this to NULL without holding the lock.
> +    Such read must be done with acquire semantics (and all corresponding
> +    writes done with release semantics). This ensures that a wakeup with error
> +    is reliably detected as (waitee==NULL && wakeup_error != 0).
>    */
> -  wait_for_commit *waitee;
> +  std::atomic<wait_for_commit *> waitee;
>    /*
>      Generic pointer for use by the transaction coordinator to optimise the
>      waiting for improved group commit.
> @@ -2068,7 +2075,7 @@ struct wait_for_commit
>        Quick inline check, to avoid function call and locking in the common case
>        where no wakeup is registered, or a registered wait was already signalled.
>      */
> -    if (waitee)
> +    if (waitee.load(std::memory_order_acquire))
>        return wait_for_prior_commit2(thd);
>      else
>      {
> @@ -2096,7 +2103,7 @@ struct wait_for_commit
>    }
>    void unregister_wait_for_prior_commit()
>    {
> -    if (waitee)
> +    if (waitee.load(std::memory_order_relaxed))
>        unregister_wait_for_prior_commit2();
>      else
>        wakeup_error= 0;
> @@ -2118,7 +2125,7 @@ struct wait_for_commit
>        }
>        next_ptr_ptr= &cur->next_subsequent_commit;
>      }
> -    waitee= NULL;
> +    waitee.store(NULL, std::memory_order_relaxed);
>    }
>  
>    void wakeup(int wakeup_error);
> _______________________________________________
> commits mailing list
> commits@xxxxxxxxxxx
> https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits