← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication

 

On Tue, 16 Mar 2010 13:20:40 +0100, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> Alex Yurchenko <alexey.yurchenko@xxxxxxxxxxxxx> writes:
> 
>> On Mon, 15 Mar 2010 10:57:41 +0100, Kristian Nielsen
>> <knielsen@xxxxxxxxxxxxxxx> wrote:
> 
>>> What I am wondering at the moment is if the concept of global
>> transaction
>>> ID
>>> should be a part of the new API, or if it is really an implemtation
>> detail
>>> of
>>> the reduncancy service.
> 
>> I'd go about it in the following way. We have an SQL server proper. And
>> it
>> has a state (database). And it is a state of the server that we want to
>> be
>> redundant (replicate, log, whatever). The particular server state is
>> identified by a global transaction ID. From here is follows that global
>> transaction ID should be the same regardless of the plugin.
>>
>> It is also quite clear that each plugin will be using its own ID format
>> internally. E.g. binlogger will be obviously using file offsets and
>> Galera
>> will be using 64-bit signed integers. Then plugins will just have to
>> implement their own mapping to the ID defined in API. Which in most
cases
>> will be trivial.
>>
>> Having a unified global transaction ID is unbelievably good, especially
>> when you have cascading replication, where each cascade can use its own
>> plugin. It is so good that you will never ever have any troubles with
it,
>> and no troubles with global transaction ID amounts to nirvana. ;)
> 
> Hm.
> 
> So in such cascading replication scenario, the changeset would actually
> keep
> its identity in the form of the global transaction ID?
> 
> So if on master1, the user does
> 
>     BEGIN; INSERT INTO t1 VALUES (...); COMMIT;
> 
> this might get global transaction ID (UUID_master1, 100)
> 
> This might get replicated to a slave1 with multiple masters. The slave1
> might
> then end up with three changesets, the one from master1, another from
> master2,
> and a third made by the user directly on slave1:
> 
>     (UUID_master1, 100)
>     (UUID_master2, 200)
>     (UUID_slave1, 50)
> 
> So what if we now want to cascade replicate from slave1 (now as a
master)
> to
> slave2? Would slave2 then see the same three global transaction IDs?
> 
>     (UUID_master1, 100)
>     (UUID_master2, 200)
>     (UUID_slave1, 50)
> 
> That does not seem to work, does it? Seems to me slave1 would need to
> assign
> each changeset a new global transaction id in order for slave2 to know
in
> which order to apply the changesets? In particular, whether to apply
> (UUID_slave1, 50) before or after (UUID_master1, 100).
> 
> So I think I misunderstood you here?
> 
> Or did you mean that the _format_ of the global transaction ID should be
> the
> same across all plugins, so that in a cascading replication scenario
where
> servers are using different replication plugins, the IDs can be treated
> uniformly?
> 
>  - Kristian.

Yes, you have misunderstood me, it is the value of the global transaction
ID that stays constant (and format too, of course) ;)

First of all, your example doesn't work exactly because you have chosen
your global trx ID format (source, id_on_source) to be linearly
incomparable.

Second, let's forget for a moment about global transaction ID format and
exact implementation, just remember that you can build a monotonic gapless
sequence out of them. And suppose that (UUID_master1, 100) has ID1,
(UUID_master2, 200) has ID2 and (UUID_slave1, 50) has ID3. And they are
ordered ID1 < ID2 < ID3 without gaps.

So slave1 has ID1, ID2, ID3. Slave2 will see the same, as everybody else.
Suppose slave2 crashes/reboots after it applied ID1. Now it can connect to
ANY node of the cluster and say "hey, I need events starting at ID2". And
every node will know where to start from, because ID2 means the same trx on
every node.

This was all talking about a single Replication Set. You're probably
envisioning master1 and master2 modifying disjoint (or maybe even the same)
sets of data independently and slave1 aggregating changes from both of
them. The masters don't see each others changes, so they can't mutually
order their changesets, only slave1 can. How to go about that?

Well, the trick here is that master1 and master2 in this case are not
really members of the same replication cluster. They don't replicate to
each other, right? So they have their own individual RS and their own
global transaction ID sequences which are indeed incomparable. slave1
participates in both clusters, but can we say that the db on slave1 is a
replica of master1 or master2. Well, it depends. If master1 modifies db1
and master2 modifies db2, then we just have 2 independent master-slave
clusters happening to share the same physical hardware as a slave. If
master1 and master2 modify the same db independently, then, strictly
speaking, we don't have a case of db replication here and slave1 will order
the changesets and assign his own ID sequence to them.

To summarize, there can be various esoteric setups and RS concept is the
key to understand the scope of global transaction ID there.

Regards,
Alex

-- 
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011



References