maria-developers team mailing list archive

Thread
Date

Re: Review of MDEV-4506, parallel replication, part 1

To: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
From: Michael Widenius <monty@xxxxxxxxxxxx>
Date: Sun, 22 Sep 2013 18:03:25 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <87fvt3akj2.fsf@frigg.knielsen-hq.org>
Reply-to: monty@xxxxxxxxxxxx

Hi!

>>>>> "Kristian" == Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

Kristian> Michael Widenius <monty@xxxxxxxxxxxx> writes:
>> +      if (list->wakeup_subsequent_commits_running)
>> +      {
>> +        mysql_mutex_lock(&list->LOCK_wait_commit);
>> +        list->wakeup_subsequent_commits_running= false;
>> +        mysql_mutex_unlock(&list->LOCK_wait_commit);
>> 
>> Why do we need the mutex above?

Kristian> I checked this again.

Kristian> We _do_ need a full memory barrier here (memory barrier is implied in taking a
Kristian> mutex). Otherwise the compiler or CPU could re-order the setting of the
Kristian> wakeup_subsequent_commits_running flag with the reads and writes done in the
Kristian> list manipulations. This could cause the two threads to corrupt the list as
Kristian> they both manipulate it at the same time.

This I would like to understand better.
(It always nice to know what the cpu/compiler is really doing)..

In this case the code is looping over a list and just setting
list->wakeup_subsequent_commits_running= false;

I don't see how the compiler or CPU could reorder the code to things
in different order (as we are not updating anything else than this
flag in the loop).

This is especially true as setting of
cur->wakeup_subsequent_commits_running= true;
is done within a mutex and that is the last write to this element of
the list.

So there is a barrier between we set it and potentially clear it.
As the 'clear' may now happen 'any time' (from other threads point of
view) I don't see why it needs to be protected.

Kristian> But after carefully checking, it seems to me a barrier would be enough, we do
Kristian> not need to lock and unlock the mutex.

Kristian> Unfortunately, we do not have any good mechanisms for doing memory barriers in
Kristian> the MariaDB source currently, AFAIK. I will put a comment here that the
Kristian> lock/unlock is only needed for the memory barrier, and could be changed
Kristian> later. Then we can re-visit it when the more critical things have been fixed
Kristian> (and if we get memory barrier primitives in the source tree). What do you
Kristian> think of this suggestion?

That is fine with me.

Kristian> [I was wondering why I did it like this in the first place.

Kristian> Due to chaining between the waiter's mutex and the waitees mutex, the extra
Kristian> lock/unlock does prevent that wakeup_subsequent_commits2() can change the flag
Kristian> in the middle of unregister_wait_for_prior_commit2() doing its
Kristian> operation. Maybe that is why I did it this way. But I could not actually think
Kristian> of any problems from such a change in the middle.

Kristian> Or maybe I did it this way to emphasise that the code conceptually is locking
Kristian> both the waitee and the waiter - but to prevent deadlocks, it temporarily
Kristian> releases the waitee lock in the middle. So instead of

Kristian>    LOCK a; LOCK b; UNLOCK b; UNLOCK a;

Kristian> we inject a temporary unlock of a:

Kristian>    LOCK a; [UNLOCK a]; LOCK b; UNLOCK b; [LOCK a]; UNLOCK a;

Kristian> But it does not really make things any clearer.]

>> The only place where we test this variable, as far as I can find, is in
>> wait_for_commit::register_wait_for_prior_commit()

Kristian> The critical place is in wait_for_commit::unregister_wait_for_prior_commit2()

>> This thread doesn't know if the other thread is just before
>> mysql_mutex_lock(&waitee->LOCK_wait_commit), inside the lock or after
>> the lock.  So when we reset the variable should not matter.

The same is true for unregister_wait_for_prior_commit2.

Kristian> What we need to ensure is that

Kristian>  - All the reads of the list in thread 1 see only writes from other threads
Kristian>    that happened _before_ the variable was reset.

Kristian>  - All changes to the list in thread 2 happen only _after_ the variable was
Kristian>    reset in thread 1.

Kristian> So a memory barrier is needed, but as you say, lock/unlock should not be
Kristian> needed.

I still don't understand the current code fully.

The issue is that unregister_wait_for_prior_commit2() can be called by
thread 2 just before or just after thread 1 is resetting
wakeup_subsequent_commits_running.  There is not other variables
involved.

This means that in unregister_wait_for_prior_commit2() if thread 2 is there
just before list->wakeup_subsequent_commits_running is set to false, we will go
into this code:

    if (loc_waitee->wakeup_subsequent_commits_running)
    {
      /*
        When a wakeup is running, we cannot safely remove ourselves from the
        list without corrupting it. Instead we can just wait, as wakeup is
        already in progress and will thus be immediate.

        See comments on wakeup_subsequent_commits2() for more details.
      */
      mysql_mutex_unlock(&loc_waitee->LOCK_wait_commit);
      while (waiting_for_commit)
        mysql_cond_wait(&COND_wait_commit, &LOCK_wait_commit);
    }

I don't however see any code in :queue_for_group_commit() that will
signal COND_wait_commit.  What code do we have that will wake up the above
code ?

Looking at wait_for_commit::wakeup() that makes this:

  mysql_mutex_lock(&LOCK_wait_commit);
  waiting_for_commit= false;
  mysql_cond_signal(&COND_wait_commit);
  mysql_mutex_unlock(&LOCK_wait_commit);

However, in
wait_for_commit::register_wait_for_prior_commit()

We set wait_for_commit() without a mutex protection ?
Is this right?

I assume that the code works because things are called in a specific
order and some threads can't call other functions until at some
certain point when things are safe.

Like that wakeup() for a thread can never be called when that thread
is just before the first mutex in

wait_for_commit::register_wait_for_prior_commit()

as in this case 'waiting_for_cond' variable may be set to false (in
wakeup()) even if it should be true.

Regards,
Monty

Follow ups

Re: Review of MDEV-4506, parallel replication, part 1
From: Kristian Nielsen, 2013-09-23

References

Review of MDEV-4506, parallel replication, part 1
From: Michael Widenius, 2013-09-13
Re: Review of MDEV-4506, parallel replication, part 1
From: Kristian Nielsen, 2013-09-17