← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication

 

On Tue, 23 Mar 2010 10:12:53 +0200, Henrik Ingo
<henrik.ingo@xxxxxxxxxxxxx>
wrote:
> Meta discussion first, replication discussion below :-)

I guess we can consider meta-discussion closed for now unless someone
wants to add to it. I'm content ;)

>> <cut>
>>> So those are the requirements I could derive from having NDB use our
>>> to-be-implemented API. My conclusion from the above is that we should
>>> consider adding to the model the concept of a transaction group,
>>> which:
>>>  -> the engine (or MariaDB server, for multi-engine transactions?) MAY
>>> provide information of which transactions had been committed within
>>> the same group.
>>>  -> If such information was provided, a redundancy service MAY process
>>> transactions inside a group in parallel or out of order, but MUST make
>>> sure that all transactions in transaction group G1 are
>>> processed/committed before the first transaction in G2 is
>>> processed/comitted.
>>
>> Well, that's a pretty cool concept. One way to call it is "controlled
>> eventual consistency". But does redundancy service have to know about
it?
> 
> If the redundancy service does not know about it, how would the
> information be transmitted by it??? For instance take the example of
> the binlog, which is a redundancy service in this model. If it
> supported this information (which it MAY do), it of course has to save
> it in some format in the binlog file.
> 
>> First of all, these groups are just superpositions of individual atomic
>> transactions. That is, this CAN be implemented on top of the current
>> model.
> 
> Yes, this is the intent.
> 
>> Secondly, transaction applying is done by the engine, so the engine or
>> the
>> server HAS to have a support for this, both on the master and on the
>> slave
>> side. So why not keep the redundancy service API free from that at all?
>> Consider this scheme:
>>
>> Database Server | Redundancy Service
>> (database data) | (redundancy information)
>>                |
>>        Redundancy API
>>
>> The task of redundancy service is to store and provide redundancy
>> information that can be used in restoring the database to a desired
>> state.
>> Keeping the information and using it - two different things. The
purpose
>> of
>> API is to separate one part of the program from the logic of another.
So
>> I'd keep the model and the API as simple as free from the server
details
>> as
>> possible.
>>
>> What it means here: redundancy service stores atomic database changes
in
>> a
>> certain order and it guarantees that it will return these changes in
the
>> same order. This is sufficient to restore the database to any state it
>> had.
>> It is up to the server in what order it will apply these changes and if
>> it
>> wants to skip some states. (This assumes that the changesets are opaque
>> to
>> redundancy service and the server can include whatever information it
>> wants
>> in them, including ordering prefixes)
> 
> Ok, this is an interesting distinction you make.
> 
> So in current MySQL/MariaDB, one place where transactions are applied
> to a replica is the slave SQL thread. Conceptually I've always thought
> of this as "part of replication code". You propose here that this
> should be a common module on the MariaDB server side of the API,
> rather than part of each redundancy service.

Yes.

> I guess this may make
> sense.

Well, it is of course a matter of debate, but not all of the
redundancy-related code has to be encompassed by the redundancy API. The
main purpose of API is to hide implementation details and it goes both
ways: we want to hide the redundancy details form the server, and likewise
we want to hide the server details from the redundancy service. Thus
flexibility and maintainability is achieved. And the thinner is the API,
the better.

That is one of the reasons of identifying the model - this is the best way
to see what this API should contain.

To put it another way, there are APIs and there is an integration code
that holds them together. Like, for example, the code that we exchanged
with Kristian.

> This opens up a new field of questions related to the user interface
> of all this. Typically, or "how things are today", a user will
> initiate replication/redundancy related events from the side of the
> redundancy service. Eg if I want to setup mysql statement based
> replication, there is a set of commands to do that. If I want to
> recover the database by replaying the binlog file, there is a set of
> binlog specific tools to do that. Each redundancy service solves some
> problems from its own specific approach, and provides a user interface
> for those tasks. So I guess at some point it will be interesting to
> see what the command interface to all this will look like and whether
> I use something specific to the redundancy service or some general
> MariaDB command set to make replication happen.

It does not so much depend on where you draw the API line, but more on
what aspects of the model you want to expose to the user. Most probably -
all. Thus we'll need the ability to create a replication set, add plugins
to its stack (perhaps first create the stack) and configure individual
plugin instances. Setting variables is definitely not enough for that, so
you'll need either a special set of commands, something along the GRANT
line, or, considering that replication configuration tends to be highly
structured and you'll keep it in the tables, a special (don't laugh yet)
storage engine where you will be able to modify table contents using
regular SQL, and this engine will in turn call corresponding API calls. I
think there could be a number of benefits in such arrangement, although I'm
not sure about performance.

> At least the application of replicated transactions certainly should
> not be part of each storage engine. From the engine point of view,
> applying a set of replicated transactions should be "just another
> transaction". For the engine it should not matter if a transaction
> comes from the application, mysqldump, or a redundancy service. (There
> may be small details: when the application does a transaction, we need
> a new global txn id, but when applying a replicated transaction, the
> id is already there.)

Certainly. I think this goes without question. What I meant back there was
that either the engine or the server should be capable of parallel
(out-of-order is interesting only if it is parallel, right?) applying and
for the purposes of recovery it will be no longer enough for the engine to
just miantain the last committed transaction ID, it'll have to keep the
list of uncommitted transactions from the last group.

-- 
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011



References