← Back to team overview

maria-discuss team mailing list archive

Re: Galera replication - optimistic locking problems

 

Hi,

With Galera, you really want to write to a single node to avoid conflicts. Since Galera doesn't do any partitioning of the data, the IO will still happen on all nodes which leaves convenience as the only reason to write to multiple nodes. A common method of avoiding deadlocks with Galera is to put a proxy in front of it that understands the cluster. One of these is MariaDB MaxScale (a team which I'm a part of) which has advanced support for Galera clusters.

The Galera monitor in MaxScale uses the wsrep_local_index variable to pick a single node in the cluster that all MaxScales write to. This eliminates the possibility of conflicts due to the distributed nature of Galera but still allows you to write to any of the MaxScale instances.

Another feature of MaxScale that could help with applications that don't know to retry transactions is the transaction_replay <https://mariadb.com/kb/en/mariadb-maxscale-25-readwritesplit/#transaction_replay> feature of the readwritesplit router. This feature has a mode <https://mariadb.com/kb/en/mariadb-maxscale-25-readwritesplit/#transaction_replay_retry_on_deadlock> where it can retry an active transaction if it ends up in a deadlock. This allows transparent retrying of the transaction while still keeping all the consistency guarantees. There are of course some limitations to what can be successfully retried which means some edge cases might not be solved by this.

Using a proxy does have its downside, increased latency and additional maintenance burden of the added servers being the most obvious ones. One way you can avoid this is to place the proxy on the application server and have it behave as a sort of a connector.

Markus

On 12/15/20 11:53 PM, Antony Stone wrote:
Therefore I'd like to find some way to:

1. tell Galera to use pessimistic locking for replication if possible (I can
accept the performance penalty)

2. tell MariaDB to automatically retry the write when the error occurs
(although I can't think of any way that could be done, since I can't create a
transaction of any sort - the operation is entirely determined for me by
Asterisk)

3. find a High-Availability (which basically means no single point of failure)
front-end for the whole cluster of 4 MariaDB servers so that the problem does
not occur.


Questions:

1. How do other people deal with this problem?

2. Are any of my potential solutions above actually feasible?  (If so, how?)

3. Does anyone have any alternative ideas about how to connect an application
which doesn't understand retrying database writes (Asterisk) with a database
which doesn't guarantee to write the data you give it (MariaDB + Galera)?

--
Markus Mäkelä, Senior Software Engineer
MariaDB Corporation
t: +358 40 7740484


Follow ups

References