← Back to team overview

maria-developers team mailing list archive

Re: deadlock, rotate and prepared xids

 

MARK CALLAGHAN <mdcallag@xxxxxxxxx> writes:

> http://bugs.mysql.com/bug.php?id=68569

This is the classic semi-sync deadlock that you explained so many times
already.

The 5.6 group commit implements a similar technique to MariaDB, where a single
thread does the fast part of commit for all transactions in the group. But for
some reason, they put the semi-sync wait inside the single-threaded loop as
well!?! So they get the classic deadlock: A waits for semi-sync notification
before dropping prepared_xid counter. B waits for A to drop prepared_xid
counter to zero before rotating binlog and releasing LOCK_log. C waits for B
to release LOCK_log before it can send events to slave and get semisync
notification for A ...

> http://bugs.mysql.com/bug.php?id=68251

This second one is different from the first one. One thread is trying to
rotate the binlog, but waiting for prepared_xids count to drop to zero. There
does not appear though to be any threads having prepared XIDs pending.

But look at the code in binlog.cc and binlog.h that handles the prepared_xids
counter:

  while (get_prep_xids() > 0)
    mysql_cond_wait(&m_prep_xids_cond, &LOCK_commit);

  void dec_prep_xids() {
    my_atomic_rwlock_wrlock(&m_prep_xids_lock);
    int32 result= my_atomic_add32(&m_prep_xids, -1);
    my_atomic_rwlock_wrunlock(&m_prep_xids_lock);
    if (result == 1)
      mysql_cond_signal(&m_prep_xids_cond);
  }

This code is trivially racy. It can easily happen that A reads the counter as
1, then B decrements the counter to zero and signals the condition (which does
nothing as there is no waiter), then A goes to sleep on the condition, never
to wake up.

BTW, it looks like there is another bug in there. You have a stack trace with
two threads, these two threads are waiting for LOCK_log to also rotate the
binlog. So it looks like the 5.6 code is missing protection from multiple
threads trying to rotate the binlog at the same time, which could lead to
dummy rotations leaving empty binlog files.

This makes me so sad. The code for 5.6 group commit is absolutely disgusting,
and clearly written by people with absolutely no clue about what they are
doing. The pthread_cond_signalling bug is even spelled out in the man page for
pthread_cond_wait()/_signal(), they clearly did not even bother reading it!
And this after years of you/Facebook, I, and Perconal all spending lots of
effort solving and carefully documenting how to do this properly :-(

> Did you do anything special to avoid deadlock with ::new_file_impl and
> prepared XIDs? My memory is vague but I thought you handled it. I added

With respect to the first one, I split the commit processing into two, and
only the first part runs serialised. And the semisync wait runs during the
second part, after the prepared_xids counter has been decremented. So there is
no potential for the same deadlock.

Later I changed binlog recovery to completely remove the prepared_xids counter
and the associated wait/stall. That should make it safe to do the semisync
wait early, since a pending commit can no longer block binlog rotation.

With respect to the second one, having a clue about synchronisation primitives
and a proper review process helps, what else can I say?

 - Kristian.


Follow ups

References