← Back to team overview

maria-discuss team mailing list archive

Re: Backup on the replication server getting affected

 

Thanks for your support Kristian & Andrei,

Pardon me...Mistyped
Indeed the environment where we are able to see the issue is in *MariaDB
10.6.10 *and using pool-of-threads.
Will check with the given WA.

Thanks and Regards,
Ragul R

On Fri, Jun 9, 2023 at 5:25 PM Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:

> ragul rangarajan <ragulrangarajan@xxxxxxxxx> writes:
>
> > Hope my issue is more related to the issue MDEV-30780 optimistic parallel
> > slave hangs after hit an error
> > Trying to reproduce with a minimal database.
> >
> > Attaching the gbd output
>
> Thanks, that gdb output is really helpful!
>
> I agree with Andrei that this rules out MDEV-30780 as the cause. Instead it
> looks to be caused by MDEV-29843, see also MDEV-31427:
>
>   https://jira.mariadb.org/browse/MDEV-29843
>   https://jira.mariadb.org/browse/MDEV-31427
>
> This is seen in the stack trace, where all the other worker threads are
> waiting on one which is stuck inside pthread_cond_signal:
>
> -----------------------------------------------------------------------
> Thread 80 (Thread 0x7f47ad065700 (LWP 25417)):
> #0  0x00007f789dca054d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f789dc9e14d in pthread_cond_signal@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #2  0x000055de401c23cd in inline_mysql_cond_signal (that=0x7f4798006b78)
> at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1099
> #3  dec_pending_ops (state=<synthetic pointer>, this=0x7f4798006b30) at
> /home/buildbot/buildbot/build/sql/sql_class.h:2535
> #4  thd_decrement_pending_ops (thd=0x7f47980009b8) at
> /home/buildbot/buildbot/build/sql/sql_class.cc:5142
> #5  0x000055de407b5726 in group_commit_lock::release (this=this@entry=0x55de41f0da80
> <write_lock>, num=num@entry=216757233923465)
>     at /home/buildbot/buildbot/build/storage/innobase/log/log0sync.cc:388
> #6  0x000055de407a0a3c in log_write_up_to (lsn=<optimized out>, lsn@entry=216757233923297,
> flush_to_disk=flush_to_disk@entry=false, rotate_key=rotate_key@entry=false,
>
>     callback=<optimized out>, callback@entry=0x7f47ad064090) at
> /home/buildbot/buildbot/build/storage/innobase/log/log0log.cc:844
> -----------------------------------------------------------------------
>
> The pthread_cond_signal() function normally can never block, so this
> indicates some corruption of the underlying condition object. This object
> is
> used to asynchroneously complete a query on a client connection when using
> the thread pool. The MDEV-29843 patch makes worker threads not use this
> asynchroneous completion, which should eliminate this problem.
>
> The stack trace strongly indicates MDEV-29843 as the cause. Except that
> MDEV-29843 patch is supposed to be in MariaDB 10.6.11, and you wrote:
>
> > Environment: MariaDB 10.6.11
>
> Can you double-check if you are really seing this hang in 10.6.11, or if it
> could have been 10.6.10 (the only version that is supposed to be vulnerable
> to MDEV-29843)?
>
> Another thing you can check is if you are using
> --thread-handling=pool-of-threads, which I think is related to the
> MDEV-29843 issue. In MDEV-31427 I suggest
> --thread-handling=one-thread-per-connection as a possible work-around.
>
> Hope this helps,
>
>  - Kristian.
>

Follow ups

References