← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication


On Sun, 24 Jan 2010 14:27:05 -0800, MARK CALLAGHAN <mdcallag@xxxxxxxxx>
> On Fri, Jan 22, 2010 at 6:21 AM, Kristian Nielsen
> <knielsen@xxxxxxxxxxxxxxx> wrote:
>> Let the discussion begin!
> The global transaction ID project done by Justin at Google is worth
> reviewing. In addition to supporting automated slave failover it also
> has options to make slave state crash-proof and add binlog event
> checksums. I doubt the patch should be reused, as the MySQL
> replication interface must be improved if we are to innovate -- but
> the wiki has a lot of details.
> * http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds
> * http://code.launchpad.net/~jtolmer/mysql-server/global-trx-ids


The global transaction ID is a cornerstone concept of a any replication
system which aspires to be pluggable, extensible and go beyond basic
master-slave. It is hardly possible to even start designing the rest of the
API without first setting on global transaction ID. This is one of the
reasons why
http://forge.mysql.com/wiki/ReplicationFeatures/ReplicationInterface can be
dissed without much consideration.

What's good about

1) It introduces the concept of atomic database changesets. (Which,
ironically, it calls "groups" due to dreaded binlog heritage) 
2) It correctly identifies that (the part of) ID should be a monotonic
ordinal number.
3) It correctly identifies that the global transaction ID is generated by
redundancy service - in that case "MYSQL_LOG::write(Log_event *)
(sql/log.cc, line 1708)"

What's bad about it:

1) It fails to explicitly recognize that IDs should be a continuous
sequence. In the implementation they are, but it is never stated
explicitly, the only explicit requirement is monotonicity. Perhaps this is
a minor omission.
2) It fails to address multi-master: (server_id, group_id) is not going to
work - such pairs cannot be linearly ordered and, therefore, compared. And
from the perspective of the node that needs to apply the changeset - does
server_id really matter? It may be good for debugging, but it can't be a
part of a global transaction ID.
3) No general theory is put behind it, it is just an attempt to fix
concrete binlog implementation. In fact it is just one huge implementation
detail. Inability to address mutl-master case is a direct consequence of

In the end it is not very useful. Whatever good points are there are
trivial. They and even more can be achieved by 15 minutes of abstract
thinking. You don't need to know MySQL binlog format for that. In fact, you
should forget about it unless you want to end up with something like
(server_id, group_id).

I'll take this opportunity to put forth some theory behind the global
transaction IDs as we see it at Codership.

1. We have an abstract set of data subject to replication/logging. It can
be a whole database, a schema, a table, a row. Lets call it a Replication
Set (RS).

2. RS is undergoing changes in time which can be represented as a series
of atomic changes. Let's call it RS History. That it is a _series_ is
trivial but important - otherwise we can't reproduce historical RS state
evolution. Each RS change is represented by a changeset. Since it is a
series, RS changesets can be enumerated with a sequence of natural numbers
without gaps within a given RS History. Here comes the first component of a
global transaction ID: sequence number (seqno).

3. However there can be more than one RS. Moreover, the same RS can end up
in different clusters and undergo different changes. So, to achieve truly
global unambiguity each changeset, in addition to seqno, should be marked
with a RS History ID. Obviously seqnos from different histories are
logically incomparable. Therefore RS History ID can be any globally unique
identifier, with no need for < or > operations. This is the second
component of global transaction ID.

One possible implementation for that can be (UUID, long long) pair.

How redundancy service will generate those IDs is an implementation
detail. For binlog/master-slave replication it is obviously trivial, even
in its current state. Changing binlog format and mapping seqnos to file
offsets is no big feat.

What is not so obvious here is that since global transaction ID is
generated by logging/replication service, it is that service that defines
the order of commits, not vice versa. As a result transaction should first
be passed to that service and only then committed. For one-way master-slave
replication the order of operations is not so important. However for
multi-master it is crucial. Note that the actual replication/logging can
still happen asynchronously, but replication service must generate
transaction ID before it is committed.

Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Follow ups