← Back to team overview

maria-developers team mailing list archive

Re: [Commit] MDEV-12746 out-of-order retry


Kristian, hello.

Thanks for looking at fixes so promptly!

> andrei.elkin@xxxxxxxxxx writes:
>> commit 3cebb54e6387a7eace1757c82ed0efd6e11590b9
>> Author: Andrei Elkin <andrei.elkin@xxxxxxxxxxx>
>> Date:   Fri Feb 9 15:00:23 2018 +0200
>>     MDEV-12746 rpl.rpl_parallel_optimistic_nobinlog fails committing
>>                out of order at retry
>>     The 2nd issue was out of order commit by transactions that
>>     are ordered after an erroring out transaction. The latter was possible
>>     in the test thanks to the above retry failure.
>>     Out-of-order committing was done despite the slave was stopping.
> Let me see if I understand this correctly:
> Normally, a transaction in optimistic parallel replication cannot commit out
> of order because it either waits for the prior transaction to commit first,
> or, if the prior transaction has stopped with an error earlier, then the
> following transaction will not even start.
> But in this case, during retry, there was missing the check that the prior
> transaction stopped with an error.


> Therefore, the following transaction's
> wait_for_prior_commit() does nothing, and it wrongly goes ahead and runs and
> even commits, even though an earlier transaction failed and replication is
> stopping.
> Correct?

That's the case.

> That's a nice catch of a subtle, but potentially serious bug.
> I wonder how this is possible though. As I recall, when a transaction fails,
> this failure is marked in the wait_for_prior_commit data structures. So the
> retrying transaction should have gotten an error in wait_for_prior_commit()
> and aborted the retries.
> So why does this not happen in this case?

One observed possibility is reflected by a simulation test.
First the parent errros out goes to `finish_event_group()' but it's
possible it does not have yet the child in its `subsequent_commits_list'
list. And that's how the parent's `wakeup_error' never reaches the
child. This is certainly the case when a child retries after a temp
failure. In the test condition the child's `worker_error' is zero, but
if a non-zero case `worker_error' gets cleared anyway through

So I understood so far that the retrying worker needs to check 
`stop_on_error_sub_id' that effectively reflects the error status.
We can't relay on `worker_error' that is.
Perhaps `subsequent_commits_list' must contain the child as well so the
retrying worker would see its status the most direct way. But I did not
look into any possible improvement into that direction.

>>     The test failures were of two sorts. One is that the number of
>> potential
>>     temporary errors actually exceeds the default value of the slave retry
>>     option.
>>     Increased @@global.slave_transaction_retries fixes that part.
> Hm, this shouldn't be needed, I think. The retry waits for all prior
> transactions to commit first, so after the first retry no further retries
> should be caused by other parallel replication workers.

You must mean the retrying worker stops doing it optimistically. I saw that.
But I saw through logs that at times a worker 
could re-try about the number of worker pool size. 
Perhaps this unrestrained behaviour was caused by lack of a check of the
current patch.

I'm checking that closely.

> So I would expect this problem to be a side-effect of the other, real issue,
> and this increase should not be needed (and might hide another problem
> later, if left in).
>> diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc
>> index 34f9113f7fe..9f1d23a1161 100644
>> --- a/sql/rpl_parallel.cc
>> +++ b/sql/rpl_parallel.cc
>> @@ -745,7 +761,25 @@ retry_event_group(rpl_group_info *rgi, rpl_parallel_thread *rpt,
>>    for (;;)
>>    {
>>      mysql_mutex_lock(&entry->LOCK_parallel_entry);
>> -    register_wait_for_prior_event_group_commit(rgi, entry);
>> +    if (entry->stop_on_error_sub_id == (uint64) ULONGLONG_MAX ||
>> +#ifndef DBUG_OFF
>> +        (DBUG_EVALUATE_IF("simulate_mdev_12746", 1, 0)) ||
>> +#endif
>> +        rgi->gtid_sub_id < entry->stop_on_error_sub_id)
>> +    {
>> +      register_wait_for_prior_event_group_commit(rgi, entry);
>> +    }
>> +    else
>> +    {
>> +      /*
>> +        An earlier transaction by another worker may have not had not
>> +        have the current one in subsequent committers list and thus
>> +        the actual induced error status could not have been sumbitted.
>> +        So it should be set by ourselves now.
>> +      */
> This comment is hard to understand - what do you mean here?

I put more correct in above. The comments should've read

   An earlier transaction by another worker may have not had
   the current one in its subsequent committers list and thus
   the actual (induced) error status to the current transaction
   could not be set by the earlier worker.

>> +      if (!rgi->worker_error)
>> +        rgi->worker_error= 1;
>> +    }
> I would have expected an abort (goto err) here (or well just below, after
> unlocking the mutex). Is this not needed, and if so, why not?

I left it to upcoming (in 2 lines) THD::wait_for_prior_commit(). It
takes care to check out `worker_error'.

  int wait_for_prior_commit(THD *thd)
      Quick inline check, to avoid function call and locking in the common case
      where no wakeup is registered, or a registered wait was already signalled.
    if (waitee)
      return wait_for_prior_commit2(thd);
      if (wakeup_error)
        my_error(ER_PRIOR_COMMIT_FAILED, MYF(0));
      return wakeup_error;

No waitee is guaranteed as this worker skipped `register_wait_for_prior_event_group_commit()'.

>> @@ -716,12 +724,20 @@ retry_event_group(rpl_group_info *rgi, rpl_parallel_thread *rpt,
>>      unregistering (and later re-registering) the wait.
>>    */
>>    if(thd->wait_for_commit_ptr)
>> -    thd->wait_for_commit_ptr->unregister_wait_for_prior_commit();
>> +      thd->wait_for_commit_ptr->unregister_wait_for_prior_commit();
> Indentation?

To be fixed.

>  - Kristian.



Follow ups