maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #02943
Re: Ideas for improving MariaDB/MySQL replication
This is a really long thread so a summary elsewhere would be great for
people like me.
I think Alex mentioned that he needs the commit protocol to be changed
so that the binlog/commit-log/commit-service/redundancy-service
guarantees commit and the storage engine does not. If that is the
case, the storage engine can do async commits. As long as it recovers
to some point in time and tells the binlog what the point in time was
(must know XID), then the binlog can give it the transactions it lost
during crash recovery. Here 'binlog' is what guarantees commit and
could be something other than a file on the master. I want something
like this. It means that we don't need to use XA internally which
currently costs 3 fsyncs per commit (2 shared, 1 not). We are changing
MySQL to really do group commit and that will change the cost to 3
shared fsyncs. But what I think you have described here is a huge
improvement.
As a further optimization, I want a callback that is called after the
binlog entries are written for a transaction and before the wait for
group commit on the fsync is done. That callback will be used to
release row locks (optionally) held by the transaction.
On Tue, Mar 30, 2010 at 11:40 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> Alex Yurchenko <alexey.yurchenko@xxxxxxxxxxxxx> writes:
>
>> On Mon, 29 Mar 2010 00:02:09 +0200, Kristian Nielsen
>> <knielsen@xxxxxxxxxxxxxxx> wrote:
>
>> The way I understood the above is that global mutex is taken in InnoDB
>> prepare() solely to synchronize binlog and InnoDB commits. Is that so? If
>
> Yes.
>
>> it is, than it is precisely the thing we want to achieve, but instead of
>> locking global mutex in Innodb prepare() we'll be doing it in
>> redundancy_service->pre_commit() as discussed earlier:
>>
>> innodb->prepare();
>>
>> if (redundancy_service->pre_commit() == SUCCESS) // locks commit_order mtx
>> {
>> innodb->commit();
>> redundancy_service->post_commit(); // unlocks commit_order mtx
>> }
>> ...
>
> Yes. This way will prevent group commit in InnoDB, as here innodb->commit()
> does fsync() under a global mutex.
>
>> This way global lock in innnodb->prepare() can be naturally removed
>> without any additional provisions. Am I missing something?
>
> Agree that this removes the need for innodb to take its lock in prepare() and
> release in commit().
>
>> On the other hand, if we can reduce the amount of commit ordering
>> operations to the absolute minimum, as you suggest below, it would only
>> benefit performance. I'm just not sure about names. Essentially this means
>> splitting commit() into 2 parts: the one that absolutely must be run under
>> commit_order mutex protection and another that can be run outside of the
>> critical section. I guess in that setup all actual IO can easily go into
>> the 2nd part.
>
> Yes (I did not think long about the names, probably better names can be
> devised).
>
>>> lock(global_commit_order_mutex)
>>> fix_binlog_or_redundancy_service_commit_order()
>>> for (each storage engine)
>>> engine->fix_commit_order()
>>> unlock(global_commit_order_mutex)
>
>> What I'd like to correct here is that ordering is needed at least in
>> redundancy service. You need global trx ID. And I believe storage engines
>> won't be able to do without it either - otherwise we'll need to deal with
>> holes in commit sequence during recovery.
>
> Yes.
>
>> Also, I'd suggest to move the
>> global_commit_order_mutex into what goes by
>> "fix_binlog_or_redundancy_service_commit_order()" (the name is misleading -
>> redundancy service determines the order, it does not have to fix it) in the
>> above pseudocode. Locking it outside may seriously reduce concurrency.
>
> Agree (in fact, though I did not say so explicitly, I thought of the entire
> pseudo code above as being in fact implemented inside the redundancy service
> plugin).
>
> - Kristian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help : https://help.launchpad.net/ListHelp
>
--
Mark Callaghan
mdcallag@xxxxxxxxx
Follow ups
References
-
Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-01-22
-
Re: Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-03-15
-
Re: Ideas for improving MariaDB/MySQL replication
From: Alex Yurchenko, 2010-03-16
-
Re: Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-03-17
-
Re: Ideas for improving MariaDB/MySQL replication
From: Alex Yurchenko, 2010-03-17
-
Re: Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-03-18
-
Re: Ideas for improving MariaDB/MySQL replication
From: Alex Yurchenko, 2010-03-19
-
Re: Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-03-28
-
Re: Ideas for improving MariaDB/MySQL replication
From: Alex Yurchenko, 2010-03-29
-
Re: Ideas for improving MariaDB/MySQL replication
From: Kristian Nielsen, 2010-03-30