← Back to team overview

maria-developers team mailing list archive

Re: 6d0c1f3ae12: MDEV-23328 Server hang due to Galera lock conflict resolution


Hi Sergei, Jan,

On Sun, Oct 24, 2021 at 9:11 PM Sergei Golubchik <serg@xxxxxxxxxxx> wrote:
> > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr, my_bool signal)
> When is it called? Under what conditions?

Right, comments that state the obvious ("what") are rather useless.
They should always explain "why".

> > +     DBUG_ENTER("wsrep_innobase_kill_one_trx");
> > +     THD *thd= (THD *) victim_trx->mysql_thd;
> > +
> > +     /* Here we need to lock THD::LOCK_thd_data to protect from
> > +     concurrent usage or disconnect or delete. */
> > +     DEBUG_SYNC(bf_thd, "wsrep_before_BF_victim_lock");
> > +     my_sleep(100000);
> Eh? Forgot to remove after debugging?
> > +     wsrep_thd_LOCK(thd);
> > +     my_sleep(100000);
> and here

Several months ago when I reviewed part of this fix, I suggested that
this code be tested with some delays added. It could be useful to
preserve those delays in some form (embedded inside DBUG_EXECUTE_IF,
or commented out) for future use.

I am sad to see that my comment regarding wsrep_is_BF_lock_timeout()
that I made in https://github.com/MariaDB/server/commit/b74b53f0515b360bb5cddec1a506a2f4d4dc21b3#r52293813
(June 17) has not been addressed. Do we really need that output? Do we
see that output in our internal testing? If not, then we have not
tested that that code is free from race conditions or hangs. (It
should be a lot safer to avoid such unnecessary unlock/lock exercises
involving multiple mutexes.) If yes, then why have we not added source
code comments to document when such scenarios would occur? I believe
that it is better to rely on some operating system features (such as
stack traces from a debugger) rather than to try to implement partial

Best regards,


Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

Follow ups