← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication


On Sat, 20 Mar 2010 13:52:47 +0200, Henrik Ingo
> On Wed, Mar 17, 2010 at 9:01 PM, Alex Yurchenko
> <alexey.yurchenko@xxxxxxxxxxxxx> wrote:
>> The problem is that you cannot really design and program by use cases,
>> unorthodox as it may sound. You cannot throw an arbitrary bunch of use
>> cases as input and get code as output (that is in a finite time and of
>> finite quality). Whether you like it or not, you always program some
>> model.
> Uh, I'm not sure I can accept this proposition. At least it seems
> contradictory to MariaDB's vision of being a practical, user and
> customer driven, database.

I do understand the desire to marry marketing to software design, but
they are simply unrelated areas of human activity. "Computer science" is
called "science" because there are real laws which no marketing genius can
invalidate. So YMMV.

> As I see it, for real world applications, you should always start with
> use cases. But it is ok if you want to come back to me and say that a
> subset of use cases should be discarded because they are too difficult
> to service, or even contradict each other. But just saying that you'd
> like to implement an abstract model without connection to any use
> cases sounds dangerous to me.

I never suggested to implement a model without connection to use cases,
and I believe I went to sufficient lengths to explain how proposed model
can satisfy a broad range of use cases. What I was saying, that you're
always programming a model, not use cases and therefore anything that you
want to implement must be expressed in terms of the model.

In this connection saying that you have a use case that does not need
linearly ordered commits really means nothing. Either you need to propose
another model, live with linearly ordered commits or drop the case. Either
way it has no effect on the design of this model implementation, because
linearly ordered commits IS the model. You cannot throw them out without
breaking the rest of the concept. So much for the usefulness of use cases
in high-level design: some of them fit, some of them don't.

> I'm also a fan of abstract thinking though. Sometimes you can get
> great innovations from starting with a nice abstract model, and then
> ask yourself which real world problems it would (and would not) solve.

And that's exactly what I'm trying to do in this thread - start with a
model, not use cases.

> Either way, you end up with anchoring yourself in real world use
> cases.

Well, when you start with a model, it means that you use it as a reference
stick to accept or reject use cases, doesn't it? So that makes the model
anchor. And leaves use cases only as means to see how practical the model

And there is another curious property to models: the more abstract is the
model (i.e. the less it is rooted in use cases), the more use cases it can
satisfy. Once you stop designing specifically for asynchronous replication,
you find out that the same scheme works for synchronous too.

>> So now we have a proposed model based on Redundancy Sets, linearly
>> ordered
>> global transaction IDs and ordered commits. We pretty much understand
>> it will work, what sort of redundancy it will provide and, as you
>> is easy to use for recovery and node joining. It satisfies a whole
>> of
>> use cases, even those where ordering of commits is not strictly
>> Perhaps we won't be able to have some optimizations where we could have
>> had
>> them without ordering of commits, but the benefit of such optimizations
>> is
>> highly questionable IMO. MySQL/Galera is a practical implementation of
>> such
>> model, may be not exactly what we want to achieve here, but it gives a
>> good
>> estimate of performance and performance is good.
> Back on track: So the API should of course implement something which
> has as broad applicability as possible. This is the whole point of
> questioning you, since now you have just suggested a model which
> happens to nicely satisfy Galera's needs :-)

Well, this may seem like it because Galera is the only explicit
implementation of that model. But the truth is Galera is possible only
because this model was explicitly followed. And this model didn't come out
of thin air. It is a result of years of research and experience - not only

For example, MySQL|MariaDB is already implementing large portion of the
proposed model by representing evolution of a database as a _series_ of
atomic changes recorded in a binlog. In fact it had global transaction IDs
from day one. They are just expressed in the way that makes sense only in
the context of a given file on a given server. Had they been recognized as
global transaction IDs, implementing a mapping from a file offset to an
ordinal number is below trivial. Then we would not be having 3rd party
patches applicable only to MySQL 5.0. (Let's face it, global transaction
IDs in master-slave replication are so trivial they are practically built
in.) The reason why there is no nice replication API in MariaDB yet is
this model was never explicitly recognized. And API is a description of a
model. You cannot describe what you don't recognize ;)

So in reality I am not proposing anything new or specific to Galera. I'm
just suggesting to recognize what you already have there (and proposing the
abstractions to express it).

> So those are the requirements I could derive from having NDB use our
> to-be-implemented API. My conclusion from the above is that we should
> consider adding to the model the concept of a transaction group,
> which:
>  -> the engine (or MariaDB server, for multi-engine transactions?) MAY
> provide information of which transactions had been committed within
> the same group.
>  -> If such information was provided, a redundancy service MAY process
> transactions inside a group in parallel or out of order, but MUST make
> sure that all transactions in transaction group G1 are
> processed/committed before the first transaction in G2 is
> processed/comitted.

Well, that's a pretty cool concept. One way to call it is "controlled
eventual consistency". But does redundancy service have to know about it?
First of all, these groups are just superpositions of individual atomic
transactions. That is, this CAN be implemented on top of the current
Secondly, transaction applying is done by the engine, so the engine or the
server HAS to have a support for this, both on the master and on the slave
side. So why not keep the redundancy service API free from that at all?
Consider this scheme:

Database Server	| Redundancy Service
(database data)	| (redundancy information)
	Redundancy API

The task of redundancy service is to store and provide redundancy
information that can be used in restoring the database to a desired state.
Keeping the information and using it - two different things. The purpose
API is to separate one part of the program from the logic of another. So
I'd keep the model and the API as simple as free from the server details

What it means here: redundancy service stores atomic database changes in a
certain order and it guarantees that it will return these changes in the
same order. This is sufficient to restore the database to any state it
It is up to the server in what order it will apply these changes and if it
wants to skip some states. (This assumes that the changesets are opaque to
redundancy service and the server can include whatever information it
in them, including ordering prefixes)

> We should not include the NDB internal replication in this discussion.

It was taken solely as an example of a real world use case where you may
not have linearly ordered commits.


Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Follow ups