maria-developers team mailing list archive

Thread
Date
Re: parallel replication performance

To: axel@xxxxxxxxxxxx
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Wed, 26 Mar 2014 13:06:30 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <53319252.4080404@askmonty.org> (Axel Schwenke's message of "Tue, 25 Mar 2014 15:27:30 +0100")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
Axel Schwenke <axel@xxxxxxxxxxxx> writes:

> The poor performance of MySQL with 1 database is expected. However MySQL is
> much better with 32 databases. Even better than MariaDB :(

Ok, so I looked closer at the numbers (of which there were quite a few, thanks
for doing this work :-). I have some idea of what could be going on.

The short story is that for this kind of configuration, it will be necessary
to configure @@slave_domain_parallel_threads to be at most the value of
@@slave_parallel_threads divided by the number of domain ids used. Eg. with 32
databases/domains, --slave-parallel-threads=64
--slave-domain-parallel-threads=2. Otherwise the domains will starve each
other for threads, leading to poor parallelism. In fact, with such high domain
parallelism @ 32 databases, I think just using
--slave-domain-parallel-threads=1 could give much better performance.

The longer story: MariaDB parallel replication uses a single pool of worker
threads, of size @@slave_parallel_threads. GTIDs are read serially from the
relay log in the SQL driver thread, and scheduled to worker threads.

Each replication domain is scheduled independently from all other
domains. Each GTID is allocated to a new worker thread if possible. If no
worker thread is available (because all workers already have at least one GTID
queued for them), then the SQL driver thread waits for one to become free;
this stalls _all_ domains from getting more work queued.

When your benchmark starts, the SQL driver thread will quickly read 64 events
from the relay logs and queue them each to a separate worker thread. Then no
more free worker threads are available, and the SQL thread will have to
wait. With 64 worker threads and 32 domains, it is quite likely that some
domains will not have any work to do. At the same time, the group commit size
will most likely be much less than 64, so many workers will have to wait for
the previous GTID in their domain without being able to do any work,
especially at the "fast" configuration where commit (which can always run in
parallel) does not take much time. The result will be a lot of threads waiting
on each other and waking each other up, and little real work done.

By setting @@slave_domain_parallel_threads, it is possible to limit how many
worker threads a single domain is allowed to grab, thereby ensuring that
worker threads will be available for all domains to be active doing
work. Those limited number of threads will be used round-robin to queue work
for; this also reduces thread sleep and wakeup, as with luck a worker thread
that completes one commit will already have more work queued up for it at that
point that it can process without first returning to the thread pool and going
to sleep.

In particular, with @@slave_domain_parallel_threads=1, we will just have 32
threads, one for each domain, each processing events independently in parallel
at full speed, which is probably the best way to handle a workload as evenly
distributed across domains as this one is. It would be very useful to see new
benchmarks with those results. And if time permits,
@@slave_domain_parallel_threads=2 could also be interesting; maybe there could
be some extra performance due to more group commit in the "peer"
configuration, or maybe the cost of repeatedly alternating between two worker
threads within the domain will cause reduced performance.

We can actually see some evidence of this theory in the numbers. Where number
of worker threads is much larger than number of domains (so starvation is less
of a problem), MariaDB does much better, even better than MySQL 5.6 in some
cases. But performance drops as the number of domains increase, possibly
because starvation kicks in.

Still, even if this theory holds, we do see 5-10% slower performance of
MariaDB at eg. slave@1 compared to MySQL. This suggests some higher overhead
somewhere, which I need to look into at some point, though it will be a while
yet before I have time for that, I am afraid. And in any case, this benchmark
is highly useful as a test - as I said before, serious load testing of the
domain_id parallelism has received very little testing, so it is quite
possible that there is simply some bugs that cause things to work poorly. The
extra testing with @@slave_domain_parallel_threads=1 (and possibly 2) will be
very useful to help understand this better.

> I think 32 databases is a bit too much, not only that MySQL then shines so
> bright, it's also unrealistic in the real world. What do you think? I can
> rerun the benchmark with 4 or 8 databases (will take 4-5 hours). Quite
> certainly MySQL will have worse numbers then :)

Well, I think 32 databases is not that unrealistic - shared hosting for
example could have many more. The completely even spacing of the load among
those databases however is probably unrealistic.

But I think we should investigate and solve the issues for MariaDB @ 32
databases. More databases, and thus more parallelism, should be better for the
parallel replication, if it is worse then that is a bug that should be
fixed. Even if it is just the need to configure
@@slave_domain_parallel_threads, maybe we could document that better.

> I also notice that for MariaDB 10.0 the tuning of the commit-wait-*
> variables is critical. Settings that give trx per commit group tend to slow
> down the master a lot. The default settings give only few transactions per

Right, this is something to be aware of.

Note that for a typical web client application or similar, --commit-wait-*
probably will not be much of an issue. An individual client will see slightly
higher latency, which should not be much of an issue if it is in the order of
a few milliseconds. Throughput is not likely to be affected, just the number
of concurrent connections will increase (at least as long as system does not
get choked).

On the other hand, a load with fixed parallelism, like batch processing or
your sysbench runs, can be hurt significantly on throughput by
--commit-wait-*, as your tests show.

Also, I have heard from several users that the ability to throttle a busy
master in order to prevent it from running too far ahead of the slaves, is
actually often seen as a benefit rather than a performance problem. Semi-sync
replication also is sometimes used to achieve similar throttling.

So yes, it is a tuning parameter that is unfortunately rather sensitive
depending on the actual load and applications.

> Please also check the "master" page of the comparison sheet. It shows the
> cost of turning on the binlog. Both MySQL and MariaDB show significant
> performance loss at high concurrency when the binlog is turned off. This is
> completely counter-intuitive. Kristian, can this be related to MDEV-5802?

It might be related. Another possibility is that this is related to InnoDB's
known weaknesses at high concurrency. When the binlog is enabled, group commit
will handle the commit of a lot of transactions serially, in a tight loop
inside a single thread. When the binlog is disabled, all of those transactions
will be committing each in their separate thread, all of those threads heavily
contending each other for the hot mutexes inside InnoDB. It would need a lot
of research to be sure that this is the cause, but it's a plausible guess, at
least.

Thanks for your efforts so far! Overall, I think the results look pretty good,
actually. We see a good speedup from MariaDB parallel replication in many of
the configurations. And your comparison shows that using group commit, a large
percentage of the speedup can be obtained automatically from group commit,
without any need to change applications and partition the data into separate
databases.

 - Kristian.
Follow ups

Re: parallel replication performance
From: Axel Schwenke, 2014-03-26