maria-developers team mailing list archive

Thread
Date

Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?

To: Zardosht Kasheff <zardosht@xxxxxxxxx>
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Wed, 24 Oct 2012 13:37:28 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CABFd+SEDCRtTgc65mrHuCji=7aD=vv_w9up6p5nNc6nHjpREMQ@mail.gmail.com> (Zardosht Kasheff's message of "Tue, 23 Oct 2012 08:58:39 -0400")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)

Zardosht Kasheff <zardosht@xxxxxxxxx> writes:

> For me, the biggest design goal is to have a way for slaves to be
> crash safe without requiring the storage engine to fsync every
> transaction it processes. This would be a huge performance win.

Right. This basically requires [*] that everything goes through a single
transaction log, ie. the InnoDB redo log or alternatively the log some other
transactional storage engine. This can be done by writing the state to a table
in the engine. (Another way is the XtraDB hack where it is essentially written
directly to the redo log, but this has problems when XA rollback is done).

BTW, when using global transaction ID in MySQL 5.6, binlog on slave is
required. "Normal" replication seems to use the mysql.slave_relay_log_info
table. Parallel replication seems to use mysql.slave_worker_info. (Am I the
only one who is sad to see this total mess, all new replication features
doing something different from each other?)

> In a world where slave replication is single threaded, I solution (2),
> where the information is stored in a slave_relay_log_info table, for
> the performance reasons you mention. Not requiring fsyncs on
> transactional commit seems like a big deal, not just for InnoDB, but
> any transactional storage engine.

Yeah. It depends on the system of course. In a high-end server with
battery-backed RAID cache and working set does not fit in memory it will be a
minor issue. On consumer-grade disks with all data in memory and small
transactions it would make a huge difference.

> As far as crash safety goes, is DDL the only issue? If so, engines can
> implement the handlerton's discover API to make itself crash safe.
> Should there be a crash, a little tweaking will be needed to get the
> relay log info to the right position, but it should be doable.

There is the slave relay log and the relay-log.info file, but those are the
ones we are talking about fixing.

Then there is master.info, but it is only changed during CHANGE MASTER, so the
window where a crash can corrup things is very small.

Neither the binlog nor the relaylog have protection against partial disk block
writes (sometimes called "torn pages") - InnoDB protects against this with the
doublewrite buffer.

Apart from those and DDL I can't think of anything at the moment, does not
mean there isn't anything else though.

> To better analyze this, can you please give some details on how a
> slave would apply a binary log in parallel? If it is just what MySQL
> does, and applies it per database, then perhaps a row in the relay log
> table per database would be sufficient. I think whatever solution
> makes sense would depend on the implementation of parallel
> replication.

We are thinking of several methods, which could supplement each other:

 - http://askmonty.org/worklog/Server-RawIdeaBin/?tid=169

 - http://askmonty.org/worklog/Server-RawIdeaBin/?tid=184

 - https://lists.launchpad.net/maria-developers/msg04911.html (see also
   http://mysql.taobao.org/index.php/RLPR_for_MariaDB) 

 - http://askmonty.org/worklog/Server-RawIdeaBin/?tid=186

 - http://askmonty.org/worklog/Server-RawIdeaBin/?tid=208

But the short story, there are basically two kinds of approaches: out-of-order
(transactions commit in different order on slave than on master) and in-order
(transactions may execute in different order on slave, but commits are
synchronised to happen in the same order as on master).

The MySQL approach that applies per database is an out-of-order approach. I
agree that these approaches need distinct rows in the table per database (or
per-whatever). Because every database could be in a different place in the
binlog.

The in-order approaches have just a single location in the master binlog at
any one time. But if we use just one row in the table for all of them, then we
get row lock contention, and loose parallelism.

I think it is best just to always insert a new row for every commit, and
periodically delete old rows. This works for both out-of-order and in-order.

> True, but that's the way it is. It is (hopefully) well known that
> keeping data across multiple engines consistent incurs performance
> hits with XA. So, users will hopefully have their data predominately

Agree.

 - Kristian.

References

Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Zardosht Kasheff, 2012-10-18
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Kristian Nielsen, 2012-10-19
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Zardosht Kasheff, 2012-10-19
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Kristian Nielsen, 2012-10-19
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Zardosht Kasheff, 2012-10-19
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Kristian Nielsen, 2012-10-22
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Kristian Nielsen, 2012-10-23
Re: Fwd: possible bug in MySQL 5.6.7-rc slave replication crash safety?
From: Zardosht Kasheff, 2012-10-23