← Back to team overview

maria-discuss team mailing list archive

Re: Backup on the replication server getting affected

 

Howdy Ragul,

> Hi Andrei,
>
> Do we have any procedures to reproduce the issue MDEV-30780?

Thanks for posting the gdb bt:s. They rule out 30780 yet not
suggesting to me enough about the hang reason. This is something new to
me and does deserve filing an MDEV ticket.
Still I'd defer that until one has confirmed the same issue is seen
on the latest 10.6. So you could run your load against the most recent
slave version that'd be at least the safest (for our time).

It might be (a slave worker) Thread 80 spinning inside

   #6  0x000055de407a0a3c in log_write_up_to (lsn=<optimized out>, lsn@entry=216757233923297, flush_to_disk=flush_to_disk@entry=false, rotate_key=rotate_key@entry=false, 

a goto repeat "loop".
That hopefully you can confirm any next time the hang appears back.
Could you please check whether #6 calls iteratively indeed
`group_commit_lock::release()`? (With  e.g
  (gdb) br thd_decrement_pending_ops thread 80
of course the number may change:-)).

All the other slave worker threads may be waiting for the 80 but I can't confirm that
until more data gets available.
Namely I need to see the output of

  (gdb) thr app all get_about_worker_thread

where the latter is defined as

define get_about_worker_thread
  if $_any_caller_is ("handle_rpl_parallel_thread", 50)
    bt
    p handle_rpl_parallel_thread::rpt
    if (handle_rpl_parallel_thread::rpt->thd->rgi_slave)
      p handle_rpl_parallel_thread::rpt->thd->rgi_slave
      p handle_rpl_parallel_thread::rpt->thd->rgi_slave->current_gtid
      p handle_rpl_parallel_thread::rpt->thd->rgi_slave->gtid_sub_id
      p handle_rpl_parallel_thread::rpt->thd->rgi_slave->worker_error
    end  
  end
end


> Unable to reproduce the issue locally but it occurs at random. 

to require some more patience from us.


I belive we can resolve it while you're helping so generously!

Cheers,

Andrei


>
> Regards,
> Ragul 
>
> On Mon, May 29, 2023 at 7:06 PM ragul rangarajan <ragulrangarajan@xxxxxxxxx> wrote:
>
>     Thanks Andrei,
>    
>     Hope my issue is more related to the issue MDEV-30780 optimistic parallel slave hangs after hit an error
>     Trying to reproduce with a minimal database.


References