← Back to team overview

maria-developers team mailing list archive

Re: A problem with implementing Group Commit with Binlog with MyRocks

 

Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx> writes:

> single thread. In MySQL, _both_ prepare and commits are so grouped from a
> single thread (though I think one thread can do group prepare in parallel
> with another doing group commit).

Ehm, this is not true, of course.
The prepare() calls are from multiple threads in parallel. Just the
flush_logs(hton, true) call is from a single thread for a whole group of
transactions.

> This way, the extra lock can be avoided for storage engines that do not need
> group_prepare(). And storage engines have freedom to implement

And I do not think this will work either, all binlog commits must use the
same lock sequence, so that a later one not taking the new lock cannot race
ahead of another. It is important to use a separate lock though, så one
storage engine prepare fsync can happen in parallel with one binlog write
fsync.

It still seems useful if the upper layer could pass down a list of the
entire group of transactions being group committed (or prepared). I think
prepare_ordered() can be just removed, it ended up never being useful. And
maybe a group_commit_ordered(list_of_transactions) can be added as an
alternative to commit_ordered().

A new group_prepare_ordered(list_of_transactions) might help the performance
issue for rocksdb. It really should be make async though. Like
group_prepare_ordered_start(cookie, list) and
group_prepare_ordered_complete(cookie) or whatever. With the MySQL "API", it
seems it is impossible for two participating storage engines to persist
their prepares in parallel, which isn't great for performance. The MySQL
flush_logs() during prepare really feels like a gross hack. It doesn't seem
right to run fsync()'s single-threaded under a lock...

 - Kristian.


References