← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication

 

Alex Yurchenko <alexey.yurchenko@xxxxxxxxxxxxx> writes:

> On Wed, 19 May 2010 15:05:55 +0200, Sergei Golubchik <serg@xxxxxxxxxxxx>
> wrote:
>> 
>> Yes, it only describes how the data get to the redundancy service, but
>> not what happens there. I intentionally kept the details of redundancy
>> out, to be able to satisfy a wide range of different implementations.
>> 
>> For example, if I'd put a global transaction ID explicitly in the model,
>> then MySQL replication would not fit into it - it has such a concept
>> only implicitly, as you have noted.
>> 
>> So, what I did was, as Robert Hodges put it, "pushed the can down the
>> road", and let the redundancy service to take care of the transaction
>> ids.
>> 
>> But perhaps I'm biased and the model I've described is influenced by
>> MySQL replication more than it should've been ?
>> 
> Oh, not really. I just wanted to note that while you were proposing a
> useful framework, you did not touch actual replication/redundancy
> specifics.

Yes, I agree. I think what we need to do is have several layers in the
API. So far I have identified three different layers:

1. Event generators and consumers. This is what Sergei discussed. The
essentials of this layer is hooks in handler::write_row() and similer places
that provides data about changes (row values for row-based replication, query
texts for statement-based replication, etc). There is no binlog or global
transaction ID at this layer, I think there may not even be a defined event
format as such, just an API for consumer plugins to get the information (and
for generator plugins to provide it).

2. Primary redundancy service and TC manager. There will be exactly one of
these in a server. It controls the 2-phase commit among different engines and
binlogs etc (and handles recovery of these after crash). And it controls the
commit order, so would be the place to implement the global transaction ID.

3. Default event format. I think it will be useful to have a standard
replication event format at a high level. This would be optional, so plugins
at level 1 and 2 would be free to define their own format, but having a
standard format at some level would allow to re-use more code and not have to
re-invent the wheel in every plugin. Maybe at this level there could also be
some API for defining the encapsulation of a specific event format, so that a
generic binlog or network transport could be written supporting multiple event
formats.

> Speaking of current MySQL replication, I was skeptical from the beginning
> that it will fit into new redundancy service in its current unmodified
> form. It is simply too integrated with the server for that (just think of
> all those HAVE_REPLICATION ifdefs). That's why I proposed to keep them side
> by side and not try to unify them.

Yes.

So with respect to the above levels, I think the current binlog implementation
can be built upon a generic layer 1 API without problems. But for layer 2, the
existing binlog implementation would be side-by-side with other
alternatives.

And for level 3 I think it would also be side-by-side. The existing binlog
format is really not very extensible, and a more flexible format (maybe based
on Google protobuffers like Drizzle does) sounds like a more likely way
forward).

So for something like Galera, I think it would hook into the layer 1 API to
get the events from statements. At layer 2, it would implement its own TC
manager, which controls the commit process and recovery, and handles the
synchronous replication algorithm. And for level 3, maybe it would implement
its own event format, or maybe it could use the default event format (and
re-use the code to package such events on a master and apply such events on a
slave), but implement its own transport for the events.

Sounds reasonable?

 - Kristian.



Follow ups

References