← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication


Meta discussion first, replication discussion below :-)

On Mon, Mar 22, 2010 at 4:41 PM, Alex Yurchenko
<alexey.yurchenko@xxxxxxxxxxxxx> wrote:
>> Uh, I'm not sure I can accept this proposition. At least it seems
>> contradictory to MariaDB's vision of being a practical, user and
>> customer driven, database.
> I do understand the desire to marry marketing to software design, but
> they are simply unrelated areas of human activity. "Computer science" is
> called "science" because there are real laws which no marketing genius can
> invalidate. So YMMV.

It is not marketing. Science can produced things with practical value,
and things with little or no practical value. We want to produce
things with practical value.

>> As I see it, for real world applications, you should always start with
> I never suggested to implement a model without connection to use cases,
> and I believe I went to sufficient lengths to explain how proposed model
> can satisfy a broad range of use cases. What I was saying, that you're
> always programming a model, not use cases and therefore anything that you
> want to implement must be expressed in terms of the model.

This is true. Skipping the part where you create a model leads to chaos.

> In this connection saying that you have a use case that does not need
> linearly ordered commits really means nothing. Either you need to propose
> another model, live with linearly ordered commits or drop the case. Either
> way it has no effect on the design of this model implementation, because
> linearly ordered commits IS the model. You cannot throw them out without
> breaking the rest of the concept. So much for the usefulness of use cases
> in high-level design: some of them fit, some of them don't.

I'm not sure about where Kristian is, but at least my participation is
based on the assumption that we are still exploring the proposed model
to see if we like it or whether we should modify it or have a
different model. This assessment is based on asking what use case are
served well by the model.

>> I'm also a fan of abstract thinking though. Sometimes you can get
>> great innovations from starting with a nice abstract model, and then
>> ask yourself which real world problems it would (and would not) solve.
> And that's exactly what I'm trying to do in this thread - start with a
> model, not use cases.
>> Either way, you end up with anchoring yourself in real world use
>> cases.
> Well, when you start with a model, it means that you use it as a reference
> stick to accept or reject use cases, doesn't it? So that makes the model
> an
> anchor. And leaves use cases only as means to see how practical the model
> is.

No, this is what I disagree with. You could propose a model that is
sound in a theoretical sense, but useless in practice because it
doesn't serve any use cases that real world users are interested in.
So the use cases are there reference stick to accept or reject the
model. But also the full set of use cases are not set in stone. We can
decide that we like a model because it serves many use cases and then
we reject the use cases not served by it.

> And there is another curious property to models: the more abstract is the
> model (i.e. the less it is rooted in use cases), the more use cases it can
> satisfy. Once you stop designing specifically for asynchronous replication,
> you find out that the same scheme works for synchronous too.

True. Abstract thinking sure is a win, there's no question about that.
But universities are also full of those scientists who produce little
of practical value. I worked one year at HUT - it was the most relaxed
job I ever had, there is no requirement to produce anything useful
unless you really want to. My masters thesis contributes something new
to the field of eLearning, that nobody had researched before. But if I
had to explain the main points of it in a business world, I could do
so in 60 seconds. The rest is just "scientific fluff".

Good science has practical value (sometimes apparent only after
decades). But not everything that happens in science is good science.

>> Back on track: So the API should of course implement something which
>> has as broad applicability as possible. This is the whole point of
>> questioning you, since now you have just suggested a model which
>> happens to nicely satisfy Galera's needs :-)
> Well, this may seem like it because Galera is the only explicit
> implementation of that model. But the truth is Galera is possible only
> because this model was explicitly followed. And this model didn't come out
> of thin air. It is a result of years of research and experience - not only
> ours.

Yes. The model certainly looks sound and promising, no question about
that. I think the discussion is more about corner cases.

> For example, MySQL|MariaDB is already implementing large portion of the
> proposed model by representing evolution of a database as a _series_ of
> atomic changes recorded in a binlog. In fact it had global transaction IDs
> from day one. They are just expressed in the way that makes sense only in
> the context of a given file on a given server. Had they been recognized as
> global transaction IDs, implementing a mapping from a file offset to an
> ordinal number is below trivial. Then we would not be having 3rd party
> patches applicable only to MySQL 5.0. (Let's face it, global transaction
> IDs in master-slave replication are so trivial they are practically built
> in.) The reason why there is no nice replication API in MariaDB yet is
> that
> this model was never explicitly recognized. And API is a description of a
> model. You cannot describe what you don't recognize ;)


> So in reality I am not proposing anything new or specific to Galera. I'm
> just suggesting to recognize what you already have there (and proposing the
> abstractions to express it).

And imho this joint effort is looking really promising all in all,
since so many experts are exchanging their wisdom. (Not really
counting myself here, although I've read many white papers about
replication :-)

> <cut>
>> So those are the requirements I could derive from having NDB use our
>> to-be-implemented API. My conclusion from the above is that we should
>> consider adding to the model the concept of a transaction group,
>> which:
>>  -> the engine (or MariaDB server, for multi-engine transactions?) MAY
>> provide information of which transactions had been committed within
>> the same group.
>>  -> If such information was provided, a redundancy service MAY process
>> transactions inside a group in parallel or out of order, but MUST make
>> sure that all transactions in transaction group G1 are
>> processed/committed before the first transaction in G2 is
>> processed/comitted.
> Well, that's a pretty cool concept. One way to call it is "controlled
> eventual consistency". But does redundancy service have to know about it?

If the redundancy service does not know about it, how would the
information be transmitted by it??? For instance take the example of
the binlog, which is a redundancy service in this model. If it
supported this information (which it MAY do), it of course has to save
it in some format in the binlog file.

> First of all, these groups are just superpositions of individual atomic
> transactions. That is, this CAN be implemented on top of the current
> model.

Yes, this is the intent.

> Secondly, transaction applying is done by the engine, so the engine or the
> server HAS to have a support for this, both on the master and on the slave
> side. So why not keep the redundancy service API free from that at all?
> Consider this scheme:
> Database Server | Redundancy Service
> (database data) | (redundancy information)
>                |
>        Redundancy API
> The task of redundancy service is to store and provide redundancy
> information that can be used in restoring the database to a desired state.
> Keeping the information and using it - two different things. The purpose
> of
> API is to separate one part of the program from the logic of another. So
> I'd keep the model and the API as simple as free from the server details
> as
> possible.
> What it means here: redundancy service stores atomic database changes in a
> certain order and it guarantees that it will return these changes in the
> same order. This is sufficient to restore the database to any state it
> had.
> It is up to the server in what order it will apply these changes and if it
> wants to skip some states. (This assumes that the changesets are opaque to
> redundancy service and the server can include whatever information it
> wants
> in them, including ordering prefixes)

Ok, this is an interesting distinction you make.

So in current MySQL/MariaDB, one place where transactions are applied
to a replica is the slave SQL thread. Conceptually I've always thought
of this as "part of replication code". You propose here that this
should be a common module on the MariaDB server side of the API,
rather than part of each redundancy service. I guess this may make

This opens up a new field of questions related to the user interface
of all this. Typically, or "how things are today", a user will
initiate replication/redundancy related events from the side of the
redundancy service. Eg if I want to setup mysql statement based
replication, there is a set of commands to do that. If I want to
recover the database by replaying the binlog file, there is a set of
binlog specific tools to do that. Each redundancy service solves some
problems from its own specific approach, and provides a user interface
for those tasks. So I guess at some point it will be interesting to
see what the command interface to all this will look like and whether
I use something specific to the redundancy service or some general
MariaDB command set to make replication happen.

At least the application of replicated transactions certainly should
not be part of each storage engine. From the engine point of view,
applying a set of replicated transactions should be "just another
transaction". For the engine it should not matter if a transaction
comes from the application, mysqldump, or a redundancy service. (There
may be small details: when the application does a transaction, we need
a new global txn id, but when applying a replicated transaction, the
id is already there.)

email: henrik.ingo@xxxxxxxxxxxxx
tel:   +358-40-5697354
www:   www.avoinelama.fi/~hingo
book:  www.openlife.cc

Follow ups