← Back to team overview

maria-discuss team mailing list archive

Εxponentially growing Innodb_row_lock_time on “master” node of MariaDB 10.3 Galera cluster

 

Hello all,

I originally posted this on serverfault, but the question hasn't gained
much traction there yet.

I've been trying to investigate issues with an application failing, and I
have reason to believe that the culprit lies somewhere in the database
backend. To this end, I started collecting metrics from the backend MariaDB
Galera cluster (currently running on MariaDB version 10.3.16), hoping that
the failures would reflect on the metrics collected.

Indeed, about 12 hours before the application started failing
spectacularly, the values reported by the 'Master' node (i.e. the node to
which the application directs the writes) for Innodb_row_lock_time started
growing at a rate never recorded before. I'm not sure the failures are
linked to this metric, but it's the only trend I've been able to notice
which correlates to the failures. Here's a link to a graph demonstrating
this fact over the week preceding the last failure:

Innodb_row_lock_time change per minute <https://i.stack.imgur.com/H4xwA.png>

Note that the graph displays change, not the current value of the metric.
MariaDB servers are polled every 90 seconds and datapoints in the graph
refer to change per minute. The big drop near the end of the graph
indicates the time when the MariaDB service was restarted on the 'Master'
node.

My question is how to further investigate this symptom and possibly
identify the culprit queries or operation. I also log the output of InnoDB
Monitor and slow queries in logfiles, but I haven't been able to find
anything out of the ordinary during the period when wait_time was growing
rapidly (although I'm no DB expert).

Is there any other logging functionality I can enable to provide more
information on this? And if InnoDB Monitor Output should provide the
information needed, what should I be looking for exactly? What kind of
operation could lead to rows being locked for so long, considering the
sudden manifestation of the issue? Is there any way of knowing which rows
were locked and whether it was a read or write lock?

Lastly, is there any reason to think this could be attributed to a MariaDB
bug instead of the applications misbehaving in any way?

Thank you in advance,

George