← Back to team overview

maria-developers team mailing list archive

Re: Understanding binlog group commit (MDEV-232, MDEV-532, MDEV-25611, MDEV-18959)


On Sat, May 20, 2023 at 11:07 PM Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> I agree that a function parameter seems simpler. We're requesting to be
> notified when a specific commit is durable in the engine, better specify
> _which_ commit than for InnoDB to guess, as you say :-).

Given that this code is only executed when switching binlog files
(every 1 gigabyte or so of written binlog), or during RESET MASTER,
the slight performance degradation due to the "unnecessary but
sufficient" condition on InnoDB should not matter much.

> We need to be careful about lifetime. The transaction may no longer exist as
> a THD or trx_t (?) in memory. But innobase_commit_ordered() could return the
> corresponding LSN, as was suggested in another mail, and then
> commit_checkpoint_request() could pass that value.

Right. The trx_t objects are being allocated from a special memory
pool that facilitates fast reuse. The object of an old transaction
could have been reused for something else. So, it would be better to
let the storage engine somehow return its logical time. 64 bits for
that could be sufficient for all storage engines, and the special
value 0 could imply that the current logic (ask the storage engine to
write everything) will be used.

> Yes. single-engine transaction is surely the important usecase to optimise.
> It's nice if multi-engine transactions still work, but if they require
> multiple fsync still, I think that's perfectly fine, not something to
> allocate a lot of resources to optimise for.

It's nice that we agree here.

> I also had the idea to use fibers/coroutines as is mentined in the MDEV
> description, but if that can be avoided, so much the better.

I too like https://quoteinvestigator.com/2011/05/13/einstein-simple/
or the KISS principle.

Example: If the buf_pool.mutex or fil_system.mutex is a bottleneck,
fix the bottlenecks (MDEV-15053, MDEV-23855) instead of introducing
complex things such as multiple buffer pool instances and multiple
page cleaner threads (removed in MDEV-15058), or introducing a
Fil_shard (MySQL 8.0). Or if the log_sys.mutex is a bottleneck, do not
introduce a "jungle of threads" that will write to multiple log files,
but just reduce the bottlenecks with increased use of std::atomic or
with a file format change that allows more clever locking
(MDEV-27774). Thread context switches can be expensive when system
calls such as mutex waits are involved, and when not, race conditions
in lock-free algorithms are hard to diagnose (often invisible to tools
like https://rr-project.org). Even when there are no system calls
involved in inter-thread communication, "cache line ping-pong" can
quickly become expensive, especially on NUMA systems.

Marko Mäkelä, Lead Developer InnoDB
MariaDB plc

Follow ups