maria-developers team mailing list archive

Thread
Date

Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Jan Lindström <jan.lindstrom@xxxxxxxxxxx>
Date: Tue, 12 Oct 2021 09:06:38 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, Ramesh Sivaraman <ramesh.sivaraman@xxxxxxxxxxx>
In-reply-to: <CABFC8DmD=CKGNjZ_wWE0WQJaMgn_RQCm8=iRf=cPMXzT07W2YA@mail.gmail.com>

Hi Sergei,

Update on wsrep_close_connections problem. My suggestion to fix this issue
is on
https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d

If you have a better solution, please advise.

R: Jan


On Mon, Oct 11, 2021 at 12:52 PM Jan Lindström <jan.lindstrom@xxxxxxxxxxx>
wrote:

> Hi Sergei,
>
> After QA runs done by Ramesh, we now know the latest fix candidate i.e.
> what is in bb-10.2-MDEV-25114-galera-v2 is incorrect. Problem is in
> wsrep_close_connections() as it holds LOCK_thread_count while it does
> abort_replicated that will call wsrep_abort_transaction and there we use
> find_thread_by_id that would also take LOCK_thread_count. As there is
> another code path here, the problem is not easily fixed. We can't just
> release LOCK_thread_count at wsrep_close_connections as we iterate the
> thread list.
>
> I must say I'm not sure what to do now.
>
> (gdb) bt
> #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at
> ../sysdeps/unix/sysv/linux/pthread_kill.c:56
> #1  0x000056333963a2e8 in my_write_core (sig=sig@entry=6) at
> /test/mtest/10.2_dbg/mysys/stacktrace.c:382
> #2  0x0000563338f2993d in handle_fatal_signal (sig=6) at
> /test/mtest/10.2_dbg/sql/signal_handler.cc:355
> #3  <signal handler called>
> #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #5  0x000014b799572859 in __GI_abort () at abort.c:79
> #6  0x000056333963edc4 in safe_mutex_lock (mp=0x563339e47220
> <LOCK_thread_count>, my_flags=my_flags@entry=0,
>     file=file@entry=0x5633396d9050
> "/test/mtest/10.2_dbg/sql/sql_parse.cc", line=line@entry=8902)
>     at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264
> #7  0x0000563338d1aea7 in inline_mysql_mutex_lock (src_line=8902,
> src_file=0x5633396d9050 "/test/mtest/10.2_dbg/sql/sql_parse.cc",
>     that=<optimized out>) at
> /test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688
> #8  find_thread_by_id (id=id@entry=48, query_id=query_id@entry=false) at
> /test/mtest/10.2_dbg/sql/sql_parse.cc:8902
> #9  0x0000563339134c39 in wsrep_abort_transaction (hton=<optimized out>,
> bf_thd=0x14b680000d90, victim_thd=<optimized out>,
>     signal=<optimized out>) at
> /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19821
> #10 0x0000563338f38cbf in ha_abort_transaction (bf_thd=bf_thd@entry=0x14b680000d90,
> victim_thd=victim_thd@entry=0x14b680000d90,
>     signal=signal@entry=1 '\001') at
> /test/mtest/10.2_dbg/sql/handler.cc:6327
> #11 0x0000563338ebed6d in wsrep_abort_thd (bf_thd_ptr=bf_thd_ptr@entry=0x14b680000d90,
> victim_thd_ptr=victim_thd_ptr@entry=0x14b680000d90,
>     signal=signal@entry=1 '\001') at
> /test/mtest/10.2_dbg/sql/wsrep_thd.cc:832
> #12 0x0000563338eaa2bd in abort_replicated (thd=thd@entry=0x14b680000d90)
> at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2269
> #13 0x0000563338eae097 in wsrep_close_client_connections
> (wait_to_end=wait_to_end@entry=1 '\001',
>     except_caller_thd=except_caller_thd@entry=0x0) at
> /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2437
> #14 0x0000563338eaedf6 in wsrep_stop_replication (thd=thd@entry=0x0) at
> /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:962
> #15 0x0000563338c543d8 in kill_server (sig_ptr=sig_ptr@entry=0x0) at
> /test/mtest/10.2_dbg/sql/mysqld.cc:2009
> #16 0x0000563338c558d5 in kill_server_thread (arg=<optimized out>) at
> /test/mtest/10.2_dbg/sql/mysqld.cc:2047
> #17 0x000014b799a7a609 in start_thread (arg=<optimized out>) at
> pthread_create.c:477
> #18 0x000014b79966f293 in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> R: Jan
>
> On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
>
>> Hi, Jan!
>>
>> On Oct 06, Jan Lindström wrote:
>> > >
>> > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
>> > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void
>> *victim_thd_ptr, my_bool signal)
>> > > > > > +{
>> > > > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
>> > > > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
>> > > > > > +  from concurrent usage or disconnect or delete. */
>> > > > >
>> > > > > How do you know victim_thd wasn't deleted before you locked
>> > > > > LOCK_thd_data below?
>> > > >
>> > > > I must say the thr_lock code is not familiar to me but there are
>> > > > mysql_mutex_lock() calls to lock->mutex. After code review  it is
>> not
>> > > > clear to me what that mutex is.
>> > >
>> > > where are mysql_mutex_lock() calls to lock->mutex?
>> >
>> > mysys/thr_lock.c there is function thr_lock() there is call to
>> > mysql_mutex_lock(&lock->mutex); this is before wsrep_break_lock where we
>> > call wsrep_thd_abort
>>
>> this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK
>> structure somewhere in the table share. It locks tables and handlers,
>> not threads. And InnoDB isn't using it at all anyway.
>>
>> >
>> > > > > >          if (victim_trx) {
>> > > > > > +                wsrep_thd_UNLOCK(victim_thd);
>> > > > >
>> > > > > what keeps victim_trx from disappearing here?
>> > > >
>> > > > Nothing. Do you have suggestions ?
>> > >
>> > > A couple of thoughts:
>> > >
>> > > * Why do you unlock LOCK_thd_data here at all? I believed the whole
>> > > point of using TOI was to make sure that even if you lock mutexes in
>> > > a different order, it will not cause a deadlock.
>> >
>> > This is DDL-case when we have a MDL-conflict not user KILL, we need to
>> > release it in my opinion because we need to take mutexes in
>> > lock_sys -> trx -> THD::LOCK_thd_data order,
>> > if I do not release I can see easily safe_mutex warnings
>>
>> I don't understand. First, lock_sys and trx mutexes are not covered by
>> safe_mutex, so it cannot possibly warn about them.
>>
>> Second, the reason for taking mutexes always in the same order is to
>> avoid deadlocks. And yor TOI trick should archieve that.
>>
>> > > * You cannot pass a THD safely over a gap where no locks keep it in
>> > > existence. You can pass an integer, thread id, and use
>> > > find_thread_by_id later.
>> >
>> > I have a suggestion for this:
>> >
>> >          if (victim_trx) {
>> > +
>> > +               const trx_id_t victim_trx_id= victim_trx->id;
>> > +               const longlong victim_thread=
>> thd_get_thread_id(victim_thd);
>> >                 lock_mutex_enter();
>> > -               trx_mutex_enter(victim_trx);
>> > -               wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim_trx,
>> signal);
>> > +               if (trx_t* victim= trx_rw_is_active(victim_trx_id,
>> NULL, true)) // If this succeeds, trx can not go away
>> > +               {
>> > +                       trx_mutex_enter(victim);
>> > +                       ut_ad(victim->mysql_thd ?
>> > +                               thd_get_thread_id(victim->mysql_thd) ==
>> victim_thread :
>> > +                               1);
>> > +                       wsrep_innobase_kill_one_trx(bf_thd, bf_trx,
>> victim, signal);
>> > +                       trx_mutex_exit(victim);
>> > +                       victim->release_reference(); // Now trx can go
>> away after we have released lock_sys
>> > +                       wsrep_srv_conc_cancel_wait(victim);
>> > +               }
>> >                 lock_mutex_exit();
>> > -               trx_mutex_exit(victim_trx);
>> > -               wsrep_srv_conc_cancel_wait(victim_trx);
>> >                 DBUG_VOID_RETURN;
>>
>> I don't understand, sorry. It's not quite clear what this patch should
>> apply to. May be you can paste what the new code should look like, not a
>> patch?
>>
>> In particular, where are wsrep_thd_UNLOCK and wsrep_thd_LOCK. Are they
>> present in this new variant at all?
>>
>> >
>> > > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
>> > > Does it need to wait for everything else being replicated?
>> >
>> > As long as it takes to start it on all nodes in the cluster.
>>
>> So, KILL basically won't work when a cluster is starting.
>> Can bf aborts happen during a cluster startup?
>>
>> Regards,
>> Sergei
>> VP of MariaDB Server Engineering
>> and security@xxxxxxxxxxx
>>
>

Follow ups

Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Sergei Golubchik, 2021-10-14

References

Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Sergei Golubchik, 2021-10-01
Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Jan Lindström, 2021-10-04
Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Sergei Golubchik, 2021-10-06
Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Jan Lindström, 2021-10-06
Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Sergei Golubchik, 2021-10-06
Re: 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
From: Jan Lindström, 2021-10-11