← Back to team overview

maria-developers team mailing list archive

MariaDB parallel replication


Some interesting work is starting to be done by users to test parallel
replication on real workloads. One issue that comes up is the performance on
workloads that have a relatively high number of lock conflicts between

I made some patches (against latest 10.1) to help investigate and improve
the issues, and I wanted to mention them here in case someone wants to
comment on or experiment with them:

1. Adding status variables for parallel replication.


This patch adds new status variables that measure the time spent by parallel
replication worker threads being

 - Idle (waiting for work from the SQL thread).

 - Processing relay log events.

 - Waiting for a prior group commit to finish (measuring the overhead of
   insufficient parallelism recorded on the master, in conservative mode, or
   the overhead of serialisation around DDL, in optimistic/aggressive

 - Waiting for the immediately prior transaction to commit (measuring the
   overhead of in-order parallel replication).

 - Rolling back and re-executing events due to deadlocks.

It would be interesting to see how the different numbers compare on various
workloads and parallel replication modes.

2. More aggressive retry of conflicting transactions.


When we get a deadlock and a transaction retry, the current MariaDB waits
for all prior transactions to commit before retrying. The logic is that
since we already got a conflict, there is a high risk that an immediate
retry will just give another conflict. So if we have T1 T2 T3 T4, and T4
conflicts with T1, we roll back T4, wait for T3 to commit, and only then
retry T4.

If we have many such conflicts, we could end up wasting a lot of times on
such waits. This patch changes aggressive mode so that T4 will only wait for
T1 to commit, then it will retry. This allows T4 to run in parallel with T2
and T3.

It will be interesting to see if this improves throughput in aggressive mode
in some workloads, and also to see if/how much it increases the number of
transaction retries and associated overhead.

3. Debug patch to log all row lock waits.


This is not a patch suitable for production use. It adds an option
--gtid-log-all-lock-conflicts. When enabled, whenever one parallel
replication transaction needs to wait on the InnoDB row lock of another, it
will log a line to the error log, including the GTIDs of the
transactions. This could be used to correlate back with the binlog and study
exactly which transactions it is that are conflicting with each other. Such
results could again be highly interesting.

 - Kristian.

Follow ups