← Back to team overview

maria-developers team mailing list archive

Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

 

Hi, Jan!

On Oct 10, Jan Lindström wrote:
> Hi Sergei,
> >
> > >   if (victim_trx) {
> > >     const trx_id_t victim_trx_id= victim_trx->id;
> > >     const longlong victim_thread= thd_get_thread_id(victim_thd);
> > >     /* This is necessary as correct mutexing order is
> > >     lock_sys -> trx -> THD::LOCK_thd_data and below
> > >     function assumes we have lock_sys and trx locked
> > >     and takes THD::LOCK_thd_data for THD state check. */
> > >     wsrep_thd_UNLOCK(victim_thd);
> > >     // GAP where thd or trx is not protected
> > >     lock_mutex_enter();
> > >     if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {
> >
> > trx_rw_is_active needs to be modified to do that, right?
> 
> No this is current behaviour, I did not change anything on
> trx_rw_is_active

In xtradb trx_rw_is_active returns bool.
I think xtradb is still the default innodb in 10.2.

In innobase it returns, indeed, trx_t*, I didn't notice that at first,
that's why I was confused.

> > >       // As trx is now referenced it can't go away
> >
> > Hmm. What happens if the thd that owns this transaction is killed or
> > the user disconnects? THD gets freed. What happens to the referenced
> > trx?
> 
> In my understanding you can't just free THD before it is aborted or
> committed, right ?
> As we have lock_sys, no trx can commit or abort inside InnoDB, and
> after this function this trx can't be deleted.

okay, good point.

> > What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
> > failure and will just proceed killing? Perhaps if
> > WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
> > anyway? Could you try to find it out?
> 
> User KILL can happen only after the node has moded to READY state so
> at startup you can't use it before the cluster is ready to serve.  We
> could just ignore the TOI error here, but what is the point? There are
> bigger problems in the cluster if TOI fails. TOI can fail only in this
> node as all other nodes in the cluster will ignore the KILL command
> (after parsing it).

Okay then

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


Follow ups

References