← Back to team overview

maria-developers team mailing list archive

Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

 

Hi, Jan!

On Oct 06, Jan Lindström wrote:
> > > > >
> > > > > I must say the thr_lock code is not familiar to me but there
> > > > > are mysql_mutex_lock() calls to lock->mutex. After code review
> > > > > it is not clear to me what that mutex is.
> > > >
> > this is for table locks. `lock` is `data->lock` where `data` is
> > THR_LOCK structure somewhere in the table share. It locks tables and
> > handlers, not threads. And InnoDB isn't using it at all anyway.
> 
> I do not know how to change this one so that thd is protected, can I
> even do so as this code does not know THD internals...

No, that was my point. thr_lock is for *table locks*, you cannot use it
to protect a THD, they protect TABLE and TABLE_SHARE. Basically, it's
irrelevant for this MDEV.

> > > > > > >          if (victim_trx) {
> > > > > > > +                wsrep_thd_UNLOCK(victim_thd);
> > > > > >
> > > > > > what keeps victim_trx from disappearing here?
> > > > >
> > > > > Nothing. Do you have suggestions ?
> > > >
> > > > A couple of thoughts:
> > > >
> > > > * Why do you unlock LOCK_thd_data here at all? I believed the whole
> > > > point of using TOI was to make sure that even if you lock mutexes in
> > > > a different order, it will not cause a deadlock.
> > >
> > > This is DDL-case when we have a MDL-conflict not user KILL, we need to
> > > release it in my opinion because we need to take mutexes in
> > > lock_sys -> trx -> THD::LOCK_thd_data order,
> > > if I do not release I can see easily safe_mutex warnings
> >
> > I don't understand. First, lock_sys and trx mutexes are not covered by
> > safe_mutex, so it cannot possibly warn about them.
> >
> > Second, the reason for taking mutexes always in the same order is to
> > avoid deadlocks. And yor TOI trick should archieve that.
> >
> static void wsrep_abort_transaction(
> handlerton* hton, THD *bf_thd, THD *victim_thd, my_bool signal)
> {
>   trx_t* victim_trx = thd_to_trx(victim_thd);
>   trx_t* bf_trx = (bf_thd) ? thd_to_trx(bf_thd) : NULL;
> 
>   if (victim_trx) {
>     const trx_id_t victim_trx_id= victim_trx->id;
>     const longlong victim_thread= thd_get_thread_id(victim_thd);
>     /* This is necessary as correct mutexing order is
>     lock_sys -> trx -> THD::LOCK_thd_data and below
>     function assumes we have lock_sys and trx locked
>     and takes THD::LOCK_thd_data for THD state check. */
>     wsrep_thd_UNLOCK(victim_thd);
>     // GAP where thd or trx is not protected
>     lock_mutex_enter();
>     if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {

trx_rw_is_active needs to be modified to do that, right?

>       // As trx is now referenced it can't go away

Hmm. What happens if the thd that owns this transaction is killed or the
user disconnects? THD gets freed. What happens to the referenced trx?

>       trx_mutex_enter(victim);
>       // In below we take THD::LOCK_thd_data

"we take victim->mysql_thd->LOCK_thd_data", correct?

>       wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim, signal);
>       trx_mutex_exit(victim);
>       victim->release_reference();
>       wsrep_srv_conc_cancel_wait(victim);
>     }
>     lock_mutex_exit();
>     DBUG_VOID_RETURN;
>   } else {
>     wsrep_thd_LOCK(victim_thd);
>     wsrep_thd_set_conflict_state(victim_thd, MUST_ABORT);
>     wsrep_thd_awake(victim_thd, signal);
>     wsrep_thd_UNLOCK(victim_thd);
>   }
>   DBUG_VOID_RETURN;
> }
> 
> > > > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
> > > > Does it need to wait for everything else being replicated?
> > >
> > > As long as it takes to start it on all nodes in the cluster.
> >
> > So, KILL basically won't work when a cluster is starting.
> > Can bf aborts happen during a cluster startup?
> 
> Not sure.

What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
failure and will just proceed killing? Perhaps if
WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
anyway? Could you try to find it out?

Regards,
Sergei


Follow ups

References