← Back to team overview

maria-developers team mailing list archive

Re: Review of MDEV-4506, parallel replication, part 1


Michael Widenius <monty@xxxxxxxxxxxx> writes:

> Kristian> We _do_ need a full memory barrier here (memory barrier is implied in taking a
> Kristian> mutex). Otherwise the compiler or CPU could re-order the setting of the
> Kristian> wakeup_subsequent_commits_running flag with the reads and writes done in the
> Kristian> list manipulations. This could cause the two threads to corrupt the list as
> Kristian> they both manipulate it at the same time.
> This I would like to understand better.
> (It always nice to know what the cpu/compiler is really doing)..
> In this case the code is looping over a list and just setting
> list->wakeup_subsequent_commits_running= false;
> I don't see how the compiler or CPU could reorder the code to things
> in different order (as we are not updating anything else than this
> flag in the loop).
> This is especially true as setting of
> cur->wakeup_subsequent_commits_running= true;
> is done within a mutex and that is the last write to this element of
> the list.
> So there is a barrier between we set it and potentially clear it.
> As the 'clear' may now happen 'any time' (from other threads point of
> view) I don't see why it needs to be protected.


I looked into this again. So we do need a memory barrier, I will try to
explain this more below. However, it turns out we already have this memory
barrier. Because in all relevant cases we call wait_for_commit::wakeup()
before we set the wakeup_subsequent_commits_running flag back to false. And
wakeup() takes a mutex, which implies a full memory barrier.

So I now removed the extra redundant locking (after adding comments explaining
why this is ok), and pushed that to 10.0-knielsen.

Let me see if I can explain why the memory barrier is needed. The potential
race (or one of them at least) is as follows:

Thread A is doing wakeup_subsequent_commits2(). Thread B is doing
unregister_wait_for_prior_commit2(). We assume B is on A's list of waiters.

A will do a read of B's next pointer, and a write of B's waiting_for_commit
flag. After that A will clear the wakeup_subsequent_commits_running flag.

B will read the wakeup_subsequent_commits_running flag, and if it is not set,
then it will remove itself from the list and later possibly put itself on the
list of another thread or whatever.

Without a memory barrier, the clearing of the flag might become visible to B
early. B could then remove itself from the list, and perhaps add itself to the
list of another thread. And then the write from A to B->waiting_for_commit
could become visible (which would cause a wrong spurious wakeup of B), or B's
writing of new next pointer could become visible to A's list traversal,
causing A to walk into the wrong list. (The latter problem sounds rather
unlikely, but at least theoretically it is allowed behaviour by compiler/CPU).

But due to the wakeup() call in A we do have a memory barrier between the list
manipulations and the clearing of the flag. And we have another memory barrier
in B (a full mutex lock actually) between checking the flag and manipulating
the list. This prevents the race.

> Kristian> What we need to ensure is that
> Kristian>  - All the reads of the list in thread 1 see only writes from other threads
> Kristian>    that happened _before_ the variable was reset.
> Kristian>  - All changes to the list in thread 2 happen only _after_ the variable was
> Kristian>    reset in thread 1.
> Kristian> So a memory barrier is needed, but as you say, lock/unlock should not be
> Kristian> needed.
> I still don't understand the current code fully.
> The issue is that unregister_wait_for_prior_commit2() can be called by
> thread 2 just before or just after thread 1 is resetting
> wakeup_subsequent_commits_running.  There is not other variables
> involved.
> This means that in unregister_wait_for_prior_commit2() if thread 2 is there
> just before list->wakeup_subsequent_commits_running is set to false, we will go
> into this code:
>     if (loc_waitee->wakeup_subsequent_commits_running)
>     {
>       /*
>         When a wakeup is running, we cannot safely remove ourselves from the
>         list without corrupting it. Instead we can just wait, as wakeup is
>         already in progress and will thus be immediate.
>         See comments on wakeup_subsequent_commits2() for more details.
>       */
>       mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit);
>       while (waiting_for_commit)
>         mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit);
>     }
> I don't however see any code in :queue_for_group_commit() that will
> signal COND_wait_commit.  What code do we have that will wake up the above
> code ?

In queue_for_group_commit() we call waiter->wakeup(), which signals

> Looking at wait_for_commit::wakeup() that makes this:
>   mysql_mutex_lock(&LOCK_wait_commit);
>   waiting_for_commit= false;
>   mysql_cond_signal(&COND_wait_commit);
>   mysql_mutex_unlock(&LOCK_wait_commit);
> However, in
> wait_for_commit::register_wait_for_prior_commit()
> We set wait_for_commit() without a mutex protection ?
> Is this right?

It should be ok. When register_wait_for_prior_commit() is called, the thread
should not be registered to wait for anything, so no other thread can access
the variable.

However, it would be fine to move setting waiting_for_commit= true under the
mutex protection, if that would make the code easier to understand.

> I assume that the code works because things are called in a specific
> order and some threads can't call other functions until at some
> certain point when things are safe.

Yes, waiting_for_commit is initialised to true, and only after do we lock the
waitee and put ourself in the list. Only once we are in the wait list of
another thread can the variable be accessed from that thread.

> Like that wakeup() for a thread can never be called when that thread
> is just before the first mutex in
> wait_for_commit::register_wait_for_prior_commit()

Yes. Because thread B can not be woken up by thread A until B has put itself
into A's waiter list, which is done under the first mutex in

Hope this helps,

 - Kristian.

Follow ups