← Back to team overview

maria-developers team mailing list archive

Re: Transactions behind a failed transaction could commit in parallel replication

 

"nanyi607rao" <nanyi607rao@xxxxxxxxx> writes:

>         if (unlikely(entry->stop_on_error_sub_id <= rgi->wait_commit_sub_id))
>           skip_event_group= true;
>
> this codes can tell latter transactions to skip but can't tell them rollback. because if a transaction started commiting before a former transaction failed (such as Lock timeout for unknown reason), the commiting transaction will not be affectd by stop_on_error_sub_id. 
>
> Then the failed transaction should wakeup latter commiting transactions and tell them to rollback, unfortunately it won't. codes like 
>       if (!rgi->is_error && !skip_event_group)
>         err= rpt_handle_event(events, rpt);
>       else
>         err= thd->wait_for_prior_commit();     
>       ... ...             
>       finish_event_group(thd, err, event_gtid_sub_id, entry, rgi);
>
> if the failed transaction didn't fail at end event, err's value would come from wait_for_prior_commit, the err would be 0 if its former transaction has successed, then the failed transaction would tell latter transactions ok to commit in finish_event_group.

Ah, I see, thanks for the detailed analysis!

Right, so I will look into this and get it fixed.
Maybe all that is needed is to remember the error code when rgi->is_error is
set, and use the real error code to pass to finish_event_group() - so that it
will pass the error to wakeup_subsequent_commits() and the following
transactions will roll back.

This error handling code is quite tricky, I hope we can get it right. It is
very helpful to get this kind of report, thanks again!

 - Kristian.


References