maria-discuss team mailing list archive

Thread
Date

Maxscale not changing to other viable masters

To: maria-discuss@xxxxxxxxxxxxxxxxxxx
From: Rodrigo Severo - Fábrica <rodrigo@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 15 Apr 2020 14:43:44 -0300

Hi,


I'm not sure this is the right mailing list to talk about MaxScale.
Please let me know if there is a better place.

My problem is that MaxScale is not using another server as MaxScale's
master when the previous master get inaccessible. More details:

I'm running MaxScale 2.4.5 and my MariaDB servers are a mixture of
10.3.21 and 10.3.22.

I have three MariaDB servers with a star topology replication GTID
based setup: each server is a slave of the other 2.

Let's call them S1, S2, and S3.

When starting the whole cluster MaxScale chooses S1 as it's master
(the one where MaxScale will send the change data
INSERT/UPDATE/DELETE) as S1 is the first one listed in MaxScale's
config.

When I stop S1 I expect MaxScale to choose another server as it's new
master but instead I get:

  28271 2020-04-15 11:54:32   error  : Monitor was unable to connect
to server S1 : 'Can't connect to MySQL server on 'S1' (115)'
  28272 2020-04-15 11:54:32   warning: [mariadbmon] 'S2' is a better
master candidate than the current master 'S1'. Master will change when
'S1' is no longer a valid master.

but S2 is never promoted to MaxScale's master, even after
((monitor_interval + backend_connect_timeout) * failcount) seconds as
mentioned in https://mariadb.com/kb/en/mariadb-maxscale-24-mariadb-monitor/#failcount

As I have default values for monitor_interval, backend_connect_timeout
and failcount - 2s, 3s, 5 retries respectively - I would expect that
in 25 seconds I would have a new master selected but, unfortunately,
MaxScale gets stuck in the above situation until I restart the S1
server, when S1 resumes it's hole as MaxScale's master.

I wonder if my problem is related to item 2 in
https://mariadb.com/kb/en/mariadb-maxscale-24-mariadb-monitor/#master-selection
which reads:

2. It has been down for more than failcount monitor passes and has no
running slaves. Running slaves behind a downed relay count.

Does the fact that the slaves of S1 (S2 and S3) are running make
MaxScale consider S1 still valid despite not being accessible at all?

More important: how can I make MaxScale change it's chosen master in a
star topology replication when the current MaxScale master stops?


Regards,

Rodrigo Severo

Follow ups

Re: Maxscale not changing to other viable masters
From: Rodrigo Severo - Fábrica, 2020-04-15