← Back to team overview

maria-developers team mailing list archive

Re: A problem with implementing Group Commit with Binlog with MyRocks

 

Kristian,

> andrei.elkin@xxxxxxxxxx writes:
>
>>> 2. To make START TRANSACTION WITH CONSISTENT SNAPSHOT actually correctly
>>> synchronise snapshots between multiple storage engines (MySQL does not have
>>> this, I think).
>>
>> (Offtopic, but anyway what it is? Multi-engine transaction with this
>> specific ISOLATION level?)
>
>   https://mariadb.com/kb/en/the-mariadb-library/enhancements-for-start-transaction-with-consistent-snapshot/
>
> So it is like a REPEATABLE READ across engines, applications can get a
> consistent view of cross-engine transactions.
> It also allows to do a non-blocking mysqldump without FLUSH TABLES WITH READ
> LOCK.
>
>>> 3. To avoid having to do an extra fsync() for every commit, on top of the
>>> one for prepare and the one for binlog write (MySQL has this).
>
>>> MySQL handles (3) by stopping all transactions around binlog rotate and
>>> doing a flush_logs().
>>
>> Maybe I am missing something, but why binlog rotation? It is not a
>> common case.  Indeed MySQL BGC reduces the number of fsync() to two by
>> the (flush stage) leader.  As to rotation, it's a specific branch of
>>  MYSQL_BIN_LOG::ordered_commit()
>> where a rotator thread contends for the flush stage mutex, eventually gains it
>> (which may lead to few more groups binlogged into the old being rotated
>> file) and performs.
>
> It used to be that there was _three_ fsyncs for every commit. The _only_
> reason the fsync in commit was needed was to ensure that binlog crash
> recovery would still work after a binlog rotation. Which was kind of
> silly.

This one must be

https://github.com/mysql/mysql-server/commit/35adf21bb63a336c76efdad6c4610161f3fd733d

>
> So _something_ needed to be done aroung binlog rotation. To ensure that all
> transactions are durably committed in storage engines before they are no
> longer available to binlog crash recovery.
>
> If I understand correctly, MySQL ensures this by temporarily stopping binlog
> writes, calling flush_logs() in all (?) engines (with semantics that
> flush_logs() must make all prior commit()'s durable), and only then allowing
> new writes to the new binlog. I am not sure how MySQL ensures that all
> commit() calls complete before the flush_logs() call, maybe it takes both
> the LOCK_commit and LOCK_log mutexes around binlog rotation.

Almost: LOCK_log and LOCK_xids -  I've checked it out, 'cos really forgot it.
MySQL employs a sort of unlogging xid former (before BGC) technique.
The rotator first (phtread_cond-)waits for all flushed-to-binlog-xids got committed,
and then having the two mutex grant ha_flush_logs() is issued right
before the new log file is set.

> The result is
> that binlog crash recovery is always possible from only one binlog file.
>
> MariaDB instead extends binlog crash recovery to consider multiple binlog
> files, if necessary. Then nothing special is needed during binlog rotation.
> But some "garbage collection" is needed to eventually release old binlog
> files.
>
>>> MariaDB avoids this stall
>>
>> You must be having a use case which I can't see..
>
> I am not sure one is better than the other. MariaDB avoids flush_logs(),
> though in current storage engines it may not matter much. The MySQL approach
> is arguably simpler code, though it seems quite an abuse of flush_logs().

It must be only for good to have two well explored methods around.

> The MySQL approach was not public when the MariaDB approach was
> implemented.

True.

Thanks a lot for talking and explaining these fine bits!

Andrei


References