← Back to team overview

maria-developers team mailing list archive

Re: How handle_rpl_parallel_thread rollback partial transaction when STOP SLAVE In parallel replication

 

"nanyi607rao" <nanyi607rao@xxxxxxxxx> writes:

> hi Kristian,

Hi nanyi607rao,

>    In handle_rpl_parallel_thread(), a worker thread has got a whole
>    transaction events, but it only apply partial events. this thread wouldl
>    skip left events when do STOP SLAVE(is that rigth?) ,because
>    sql_worker_killed() return true. but it seems that partial transaction
>    won't be rollbacked for wait_for_prior_commit() always return false. do
>    me wrong? or how would it rollback that partial transaction.

Yes, you are right. This is a bug in the current code.

I am actually at the moment working on a fix for this problem (and a number of
other similar problems related to normal stop or error stop). I am sorry that
I didn't manage to fix it before you hit it in your work.

The idea to fix this isas follows:

 - We record which transactions have started to commit

 - When we do STOP SLAVE we remember the transaction that last started to
   commit at the point at which we stopped.

 - In handle_rpl_parallel_thread(), we only start skipping events from
   transactions that start strictly _after_ the stop point. Prior transactions
   have no events skipped.

I put my current patch here:

    http://knielsen-hq.org/parallel_replication_patch_intermediate.diff

You can take a look if you want to see in more details what I am doing. I
believe this patch fixes the particular bug you mentioned. But the patch is
not complete yet, there is at least one known bug related to error stop still,
and it includes extra debug fprintf() statements and such. So you can also
just wait a few days for me to finish the patch, if you prefer, I will let you
know when I have something that is ready.

The new code should be a lot clearer and a lot more robust. But it sounds like
you are working on some extensions to the parallel replication? In that case,
my changes may cause you some more work to adapt it to the new code, sorry for
that. Please feel free to ask any further questions you have, and I will try
to answer as well and as quickly as I can.

 - Kristian.


Follow ups

References