← Back to team overview

maria-developers team mailing list archive

Re: 6d0c1f3ae12: MDEV-23328 Server hang due to Galera lock conflict resolution


Hi Jan,

Ramesh said that he has not observed any "WSREP: BF lock wait long for
trx" messages while running tests. This would suggest that the
diagnostic output code is essentially untested.

I would say that the diagnostic output is related to MDEV-23328 or
MDEV-25114, because it involves the same mutexes as the MDEV-23328
hang. Furthermore, the MDEV-23328 scenario is forced abort of
lock-holding lower-priority transactions due to applying certified
transactions (called "BF" or "brute force" in Galera).

I am glad that you will reconsider my request to remove
wsrep_trx_print_locking() and the mutex operations around the call.


On Mon, Oct 25, 2021 at 9:04 AM Jan Lindström <jan.lindstrom@xxxxxxxxxxx> wrote:
> Hi Marko,
>> I am sad to see that my comment regarding wsrep_is_BF_lock_timeout()
>> that I made in https://github.com/MariaDB/server/commit/b74b53f0515b360bb5cddec1a506a2f4d4dc21b3#r52293813
>> (June 17) has not been addressed. Do we really need that output? Do we
>> see that output in our internal testing? If not, then we have not
>> tested that that code is free from race conditions or hangs. (It
>> should be a lot safer to avoid such unnecessary unlock/lock exercises
>> involving multiple mutexes.) If yes, then why have we not added source
>> code comments to document when such scenarios would occur? I believe
>> that it is better to rely on some operating system features (such as
>> stack traces from a debugger) rather than to try to implement partial
>> logging.
> I know, this is not related to MDEV-23328 or MDEV-25114 but ok, I can remove most of this stupid code.
> R: Jan

Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation