maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #02663
Re: Ideas for improving MariaDB/MySQL replication
Hi Ingo!
Your e-mail is totally relevant and I have almost nothing there to respond
to in particular - its all as you say, I have no essential remarks. Instead
I want to respond to it in whole, thus I'll omit a lengthy quote, suffuce
say that it is a direct response.
The problem is that you cannot really design and program by use cases,
unorthodox as it may sound. You cannot throw an arbitrary bunch of use
cases as input and get code as output (that is in a finite time and of
finite quality). Whether you like it or not, you always program some model.
It is by definition that a program is a description of some model. If you
have not settled on a model, you're in trouble - and that's where mysql
replication is. This is a direct consequence of trying to satisfy a bunch
of use cases without first putting them in a perspective of some general
abstract model. I mention this not to belittle anything or anyone -
everybody makes mistakes. But the subject of this thread is "Ideas for
improving MariaDB/MySQL replication", and so mistakes should be learned
upon, but not repeated.
Let me refer to the following analogy: suppose you want to create a
transport agency. To transport stuff. You know, people, animals, cargo -
stuff. There's a billion of use cases. But when you get to it you have
models to choose (thankfully there are already models for that, you don't
have to develop one). E.g. you can transport by air or by land. And each of
these models has it own laws and limitations. Like you reliably cannot
transport by land faster than at 200 km/h. You cannot transport a lot of
cargo by air, as well as you can't have stops every 10km to pick up
passengers. So you gotta settle on the model that suits you most.
Now you can say that why? Why not choose both models? Well, notice that
they are still models. There is a whole lot of other use cases that you
cannot satisfy by them. Next, do you know many companies that do both land
and air transportation? You can own both of them indeed, but for the sake
of efficiency they'll be different companies because aside from
load()/unload() functions their interfaces, internals and logistics are
likely to be very different.
This is a clumsy analogy indeed, but I hope it helps.
So now we have a proposed model based on Redundancy Sets, linearly ordered
global transaction IDs and ordered commits. We pretty much understand how
it will work, what sort of redundancy it will provide and, as you agreed,
is easy to use for recovery and node joining. It satisfies a whole bunch of
use cases, even those where ordering of commits is not strictly required.
Perhaps we won't be able to have some optimizations where we could have had
them without ordering of commits, but the benefit of such optimizations is
highly questionable IMO. MySQL/Galera is a practical implementation of such
model, may be not exactly what we want to achieve here, but it gives a good
estimate of performance and performance is good.
Now this model may not fit, for instance, NDB-like use case. What options
do we have here?
1) Extend somehow the proposed model to satisfy NDB use case. I don't see
it likely. Because, as you agreed, NDB is not really about redundancy, it
is about performance. Redundancy is quite specific there. And it is not by
chance that it is hard to migrate applications to use it.
2) Develop a totally different model to describe NDB use case and have it
as a different API. Which is exactly what it is right now if I'm not
mistaken. So that it just falls out of scope of today's topic.
There is one more option - just forget about NDB use case which may be
there only because there is nothing better. There are other ways to get
partitioning and replication to work together without pushing them behind
the same interface. E.g. you can have "replication cluster" of "partition
clusters" - or "partition cluster" of "replication clusters" (i.e. each
replication cluster replicating a single partition)
Disclaimer: NDB use case was taken as an example.
The bottom line - you can just say that sometimes we don't need total
ordering of commits. You gotta put it in the model.
On Wed, 17 Mar 2010 13:03:02 +0200, Henrik Ingo
<henrik.ingo@xxxxxxxxxxxxx>
wrote:
<skip>
>
>> I don't think that you need 2PC between redundancy service and the
>> storage
>> engines, because redundancy service never fails. Well, when it fails,
you
>> have something more important to worry about than disk flushes anyways.
>
> How does synchronous replication happen without 2PC?
>
It does, it does. E.g. it does so in MySQL/Galera, see my response to
Kristian. Actually how can it work otherwise? What is the meaning of
prepare() in replication step? How can engine commit fail at this point
except for the crash?
Regards,
Alex
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011
Follow ups
References