maria-developers team mailing list archive

Thread
Date

Review request for MDEV-6020, MDEV-5914, MDEV-5941, and MDEV-5262

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Tue, 10 Jun 2014 13:36:55 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Hi Serg,

We discussed on IRC a review of the $subj bugs related to deadlocks in
parallel replication and handling of automatic retry of transactions in case
of deadlock.

The patches are in my tree:

lp:~maria-captains/maria/10.0-knielsen

The complete diff can be obtained eg. with:

bzr diff -rrevid:psergey@xxxxxxxxxxxx-20140429143317-tkmxrzu6y36hpi7t..revid:knielsen@xxxxxxxxxxxxxxx-20140610081315-o5nx3u32uvqaai7q

Let me give some background for these bugs:

When two transactions T1 and T2 group-commit on the master, we assume that
they have no lock conflicts (otherwise one would have had to wait for the
other to commit). So we run them in parallel on the slave. But before we let
T2 commit, we wait until T1 has committed first, so that we get the same
binlog order on the slave as we had on the master.

But if T1 and T2 somehow _do_ get a lock conflict on the slave, and T1 ends up
waiting for some lock held by T2, we get a deadlock. This causes the
replication to hang (the wait of T2 for T1 to commit is not part of InnoDB
deadlock detection), which is quite bad.

Elena's testing of the parallel replication in fact uncovered a number of
cases where such deadlock issues occur:

1. MDEV-5914. It turns out that InnoDB gap locks can work in a non-symmetric
way. If T1 is an INSERT, and T2 is a DELETE, then T1 can run first and not
block T2. But if T2 runs first, then it may set a gap lock that blocks
T1. Thus, depending on exact execution order, we can get T1 and T2 to group
commit on the master, but T2 block T1 on the slave and cause a deadlock.

2. MDEV-5941. If eg. an UPDATE uses different execution plans on the master
and slave, it might use index scan on the master, but table scan on the
slave. Thus it may take more locks on the slave than on the master, causing it
to conflict with another transaction that is run in parallel and cause a
deadlock.

3. MDEV-6020. Here, there is a multi-value insert into multiple partitions of
a table with auto-increment key. It appears that in this case InnoDB takes an
auto-increment table lock on each partition. Since these locks are taken in
the order of the inserted values, they can be taken in different order between
different INSERT statements, and can cause a deadlock. Since these locks are
released during the statement execution (before the commit), such deadlock can
occur on the slave even though it did not occur on the master. (Note that this
deadlock _is_ detected by InnoDB, and is not related to the commit order on
the parallel slave).

I know a way to prevent (1) on the slave (and it is implemented in the
patches). I do not know if there is a way to prevent (3), it would require
understanding better how innodb autoincrement and partitioning works. I do not
see any way to prevent (2) from occuring on the slave, in the general case.

It also seems to me that there are likely to be more corner cases lurking, and
even if I somehow managed to fix the three above problems, there would be some
other cases where an extra lock conflict can occur on the slave and cause the
slave to hang.

So I decided that a general solution is needed to such deadlocks. So the
currently proposed patch does two things:

A. It adds code to InnoDB, so that whenever T1 starts to wait inside InnoDB
for a lock held by T2, this wait is reported to the upper server layer. The
replication code will then check if T1 and T2 are part of parallel replication
such that T1 must commit before T2. If so, we have a deadlock, and the code
will terminate T2 with a deadlock error, allowing T1 to proceed.

B. It implements code to handle deadlocks during parallel replication. When a
transaction gets a deadlock error (normal, or provoked by the new deadlock
detection code from (B)), it is rolled back, and re-executed.

The result should be that even if there are other non-discovered cases where
locks can conflict in different ways on the master from on the parallel slave,
such conflicts will not cause replication to fail. Instead they will be
detected, and handled automatically by resolving the deadlock. I think this
will be important to ensure the reliability of parallel replication for users.

I went through quite a few different ideas and attempts before ending with the
present solution. So I wanted to give this background about how the patch
relates to the basic problem that needs to be solved. I found a _lot_ of
tricky issues during testing, and so I am somewhat unhappy about the
complexity of this, even though the final patch may look simple enough on the
surface. Maybe you can have some ideas for what could be done simpler, or at
least question anything that looks incorrect or unnecessary.

Thanks,

- Kristian.

Follow ups

Re: Review request for MDEV-6020, MDEV-5914, MDEV-5941, and MDEV-5262
From: Sergei Golubchik, 2014-07-03