← Back to team overview

maria-developers team mailing list archive

Re: Ideas for improving MariaDB/MySQL replication

 

MARK CALLAGHAN <mdcallag@xxxxxxxxx> writes:

> This is a really long thread so a summary elsewhere would be great for
> people like me.

I agree that the discussion has become quite long. I summarised the group
commit part of it on my blog:

    http://kristiannielsen.livejournal.com/12254.html
    http://kristiannielsen.livejournal.com/12408.html
    http://kristiannielsen.livejournal.com/12553.html

> I think Alex mentioned that he needs the commit protocol to be changed
> so that the binlog/commit-log/commit-service/redundancy-service
> guarantees commit and the storage engine does not. If that is the
> case, the storage engine can do async commits. As long as it recovers
> to some point in time and tells the binlog what the point in time was
> (must know XID), then the binlog can give it the transactions it lost
> during crash recovery. Here 'binlog' is what guarantees commit and
> could be something other than a file on the master. I want something
> like this. It means that we don't need to use XA internally which
> currently costs 3 fsyncs per commit (2 shared, 1 not). We are changing
> MySQL to really do group commit and that will change the cost to 3
> shared fsyncs. But what I think you have described here is a huge
> improvement.

Yes, it sounds quite promising.

> As a further optimization, I want a callback that is called after the
> binlog entries are written for a transaction and before the wait for
> group commit on the fsync is done. That callback will be used to
> release row locks (optionally) held by the transaction.

I think the point here is that the locks must not be released until the order
in the binlog has been determined, right? So that any transaction order
enforced by the log will be the same on the slave. So the callback might be
called before or after the actual write of the binlog, but only after (not
before) determining the order of such write?

I think this could be handled by the xa_prepare_fast() and/or the
commit_fast() callbacks that I propose in the third article referenced above.

BTW, it was great to discuss these issues with you at the MySQL Conference!

 - Kristian.



Follow ups

References