maria-developers team mailing list archive

Thread
Date
Re: replication improvements going on here and there

To: "Rasmus Johansson" <rasmus@xxxxxxxxxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Fri, 28 Jan 2011 14:01:04 +0100
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <01a701cbbedd$316c6bb0$94454310$@montyprogram.com> (Rasmus Johansson's message of "Fri\, 28 Jan 2011 13\:19\:12 +0200")
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)
"Rasmus Johansson" <rasmus@xxxxxxxxxxxxxxxx> writes:

> I would need your help to understand the different replication topics that
> are very active now:
>
> - The replication work  that you have been doing 

As far as implementation goes, it is these four worklogs, which are in code
review and available in the feature preview tree
lp:~maria-captains/maria/mariadb-5.2-rpl:

    http://askmonty.org/worklog/Server-Sprint/?tid=116
    http://askmonty.org/worklog/Server-RawIdeaBin/?tid=136
    http://askmonty.org/worklog/Server-Sprint/?tid=163
    http://askmonty.org/worklog/Server-Sprint/?tid=132

This work improves replication in a number of ways:

 - Implement working group commit for the replication binlog. This greatly
   improves performance when running with sync_binlog=1, which is necessary to
   ensure the ability to reliably recover replication state after a master
   crash.

 - Ensure same commit order between innodb and binlog, without the need for
   the expensive prepare_commit_mutex. This is needed by XtraBackup and InnoDB
   hot backup to do a non-blocking online backup that can be used to provision
   a slave. The other existing patches I have seen solve this by having the
   user need to turn on/off the prepare_commit_mutex depending on whether
   backup is going on or not.

 - Allow to obtain an InnoDB consistent read snapshot and the corresponding
   binlog position in a fully non-blocking way. This allows for example to
   provision a slave from a fully non-blocking `mysqldump --master-data
   --single-transaction`. Previously, this required FLUSH TABLES WITH READ
   LOCK, which blocks queries in the server for some time.

 - Port the Facebook patch to release InnoDB row locks early, already during
   the prepare phase. This can improve performance in the presense of hot-spot
   rows in busy OLTP applications. (This feature must be enabled explicitly,
   it is off by default).

 - Make the transaction coordinator inside MariaDB plugable, allowing to plug
   in different implementations. This is mostly preparatory work for pluggable
   replication, as the replication binlog functions as a transaction
   coordinator. So with this, a new binlog implementation can be plugged in
   and do the same transaction coordinator role that the legacy binlog
   normally handles.

In addition to this, there was a long discussion on the MariaDB mailing list
about various replication topics, and I wrote up the following specifications
(but did not start on the implementation yet):

    http://askmonty.org/worklog/?tid=107
    http://askmonty.org/worklog/?tid=120
    http://askmonty.org/worklog/?tid=133

The idea with these is to gradually make the existing legacy MySQL replication
and binlog pluggable, opening the way for writing completely new replication
implementations. It builds on top of the implementation work above.

I also worked on the specifications of the parallel replication worklogs
MWL#169 and MWL#170 mentioned below.

> - The parallel replication that Oracle is talking about for MySQL 5.6

Do you mean this?

    http://forge.mysql.com/wiki/ReplicationFeatures/ParallelSlave
    http://forge.mysql.com/worklog/task.php?id=4648

The idea is to take one transaction at a time, and run the operations inside
it in parallel. That is a bit unusual, most other projects work by running
multiple transaction in parallel.

I think this is motivated by NDB (though it is just me reading between the
lines, I do not know for sure). Since in NDB, multiple parallel transactions
on the master are replicated as a single, bigger transaction called an
epoch. So they need something on the slave that can take apart the epoch again
and make it parallel.

Frankly, when I first saw this, I dismissed it, thinking it was the wrong
approach to the problem (outside of NDB replication at least).

I am still quite sceptical. Though it may be possible to use this for normal
OLTP workloads (with small transactions) by artificially combining multiple
transactions, at the cost of higher delay between master and slave.

> - The replication functionality that <customer> was interested in together
> with <partner> and us

This is a request for implementing another algorithm for parallel
replication. I wrote up some designs for this:

    http://askmonty.org/worklog/Server-RawIdeaBin/?tid=169
    http://askmonty.org/worklog/Server-RawIdeaBin/?tid=170

This overlaps with the above-mentioned parallel replication project in Oracle,
however the chosen approach/algorithm is quite different between the two. In
this work, it would be the responsibility of the application to ensure that
parallel transactions modifying tables in two different databases are
independent. Then such transactions can be replicated in parallel on the
slave.

> Could you just give a short description on each of those and then describe
> how they connect to each other or are there something that duplicates?

Hope this helps,

 - Kristian.