← Back to team overview

maria-discuss team mailing list archive

Re: InnoDB: background jobs & fsyncs

 

Thank you for your help.

Le 15/09/2020 à 12:51, Marko Mäkelä a écrit :
> First of all, is the version really 10.2.23, which was released in
> March 2019?

Yes, it is. No upgrade since it was installed in April 2019.

> A significant source of background activity is the purge of version
> history of old transactions. If SHOW ENGINE INNODB STATUS is reporting
> a nonzero "History list length", some purge activity will be needed.
> Before MariaDB 10.5, there was also a background merge of buffered
> changes to secondary indexes.

Good to know.

I hoped to get an answer about how to tell InnoDB to complete its background tasks as soon as possible (even if background activity has no impact on perf, not having to wait so long to make a clean backup of the datadir remains useful) but I guess there's no way. Should I open a task issue on JIRA ?

> I would suggest checking with http://poormansprofiler.org/ or "sudo
> perf top" what is causing the I/O, and posting the stack traces.

Tried but apparently without pertinent results. NVMe seems too fast for background fsyncs to have an impact here. MariaDB reports a high number of fsyncs/s but maybe some (most?) of them are somehow merged.

With blktrace, I could verify that eatmydata ( https://launchpad.net/libeatmydata ) does eliminate the fsync done by mysqld but it does not speed up anything.

Same thing if I wait for background activity to finish: for the kind of foreground activity we have, it's not significantly faster.

> You might even try attaching GDB to the server and setting a breakpoint
> on fsync(), with breakpoint commands "finish" and "continue" so that you
> can collect interesting stack traces.

I didn't try.

> Also, I would recommend you to try a later major version. The InnoDB
> version in MariaDB 10.3 should scale better thanks to a lock-free hash
> table for maintaining the set of active transactions. MariaDB 10.5
> included further improvements. But, be aware that we are working on a
> 10.5 performance regression in page flushing, in MDEV-23399.

I only tried 10.3.24 and same deadlocks.

After further analysis, we found that the deadlocks are due to a problem in our application. We have a scheduler that uses MariaDB to store tasks and the performance profile of the new hardware is so unusual that some kinds of tasks accumulated to a point that the scheduler couldn't handle.

Julien



Follow ups

References