maria-developers team mailing list archive

Thread
Date

Re: Ideas for improving MariaDB/MySQL replication

To: Sergei Golubchik <serg@xxxxxxxxxxxx>
From: Alex Yurchenko <alexey.yurchenko@xxxxxxxxxxxxx>
Date: Tue, 02 Feb 2010 16:18:11 +0200
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100201100622.GA18275@janus.mylan>
Organization: Codership Oy
User-agent: RoundCube Webmail/0.3.1

Hi!

On Mon, 1 Feb 2010 11:06:22 +0100, Sergei Golubchik <serg@xxxxxxxxxxxx>
wrote:
> Hi, Alex!
> 
> On Jan 27, Alex Yurchenko wrote:
>> 
>> I'll take this opportunity to put forth some theory behind the global
>> transaction IDs as we see it at Codership.
>> 
>> 1. We have an abstract set of data subject to replication/logging. It
>> can be a whole database, a schema, a table, a row. Lets call it a
>> Replication Set (RS).
>> 
>> 2. RS is undergoing changes in time which can be represented as a
>> series of atomic changes. Let's call it RS History. That it is a
>> _series_ is trivial but important - otherwise we can't reproduce
>> historical RS state evolution. Each RS change is represented by a
>> changeset. Since it is a series, RS changesets can be enumerated with
>> a sequence of natural numbers without gaps within a given RS History.
>> Here comes the first component of a global transaction ID: sequence
>> number (seqno).
> 
> Why should it be a sequence of natural numbers without gaps ?

1) Well, to begin with I didn't say that it "should", I said that it "can"
;). I was proposing a definition, so it depends on what we want to achieve.
Obviously, "gaplessness" is a useful requirement. It allows global
transaction IDs to not only be globally unique, but also unambiguously
indicates the position of a changeset in the history of changes. By
relaxing it we are loosing a natural way to check for gaps in the stream of
events. E.g. you can't say if you can concatenate binlog files one of which
ends at 10 and another starts at 12. When a node at position 100 joins a
cluster, how will it know that the next event to process is 113? "No gaps"
requirement allows us to take a single changeset, carry it around all we
like and then apply consistently elsewhere without the need of any other
context.
Indeed, there are other ways to address this, but gapless seqno is
obviously the simplest of them all.

2) It is not a limiting requirement at all. If we agree, that RS undergoes
a _series_ of changes - change 1, change 2, change 3, change 4, change 5,
etc...- nothing prevents us from enumerating them without gaps. Moreover,
why and according to what algorithm are you going to introduce gaps in
sequence numbers? I mean, there is additional work to be done to achieve
gaps when enumerating a sequence.

3) It simplifies testing and debugging.

Robert from Continuent also raised this question, but I didn't have a
chance to respond to it in time. No, gapless seqnos is not a caprice of
Galera developers ;), it is just a proposal for global trx ID based on our
experience. Galera can maintain it internally anyways, but I believe
everyone would benefit from it.

>> 3. However there can be more than one RS. Moreover, the same RS can
>> end up in different clusters and undergo different changes. So, to
>> achieve truly global unambiguity each changeset, in addition to seqno,
>> should be marked with a RS History ID. Obviously seqnos from different
>> histories are logically incomparable. Therefore RS History ID can be
>> any globally unique identifier, with no need for < or > operations.
>> This is the second component of global transaction ID.
> 
> Assuming we want to replicate just one table, do you mean that in a
> replication cluster this Logical Table is a Replication Set, that is
> all copies of this table on all nodes belong to the same RS ?

Yes, with a small clarification, that strictly speaking, copies of the
table don't _belong_ to RS, they _are_ RS, that is in your example
Replication Set consists of a single table, not of multiple copies the
table. It is probably more correct to speak about RS replicas on nodes,
rather than copies of the table. 

Regards,
Alex

-- 
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

References

Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-01-22
Re: Ideas for improving MariaDB/MySQL replication
From: MARK CALLAGHAN, 2010-01-24
Re: Ideas for improving MariaDB/MySQL replication
From: Alex Yurchenko, 2010-01-27