← Back to team overview

maria-discuss team mailing list archive

Re: Galera Cluster: Cluster Blocked, when one node down?

 

Hi Benoit,

indeed, a slow node can impact the rest of the cluster, that's why, like
Jamie pointed out, DNS round robin is not a viable method to distribute
load across a Galera cluster. Several solutions exist:
- HAProxy with Galera checkscript
- our own MariaDB MaxScale which includes a Galera Monitor
- glbd (small load balancing daemon which comes with Galera)

Regards,

On Mon, Dec 14, 2015 at 10:18 AM Jamie Gibbard <Jamie.Gibbard@xxxxxxxxxxxx>
wrote:

> You should consider using a better method for connecting to your DB
> servers, than DNS round robin.
>
> Think about using a haproxy load balancing node, with the clustercheck
> script (https://github.com/olafz/percona-clustercheck)
>
> This would ensure that a node is not only accessible on its MySQL port,
> but ready for action!
>
>
>
>
> -----Original Message-----
> From: Maria-discuss [mailto:maria-discuss-bounces+jamie.gibbard=
> netnames.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of Benoit Panizzon
> Sent: 14 December 2015 08:31
> To: MariaDB discuss
> Subject: [Maria-discuss] Galera Cluster: Cluster Blocked, when one node
> down?
>
> Hello
>
> We use MariaDB Galera Cluster for our email service platform.
>
> We decided to use Galera to create a high availability platform.
>
> After a year of operation, we start to relaize, that somehow Galera
> Failures seem to be the most common cause for outages we had in the past.
>
> So I wonder if others operating galera clusters also observe this
> situation:
>
> All our services using DB connections use a DNS round-robin name, to
> connect to one of our three galera instances.
>
> While testing this setup, we usualy killed one instance, or disconnected
> the node from the network to simulate an outage. In this situation, this
> works as expected. The client connect to the two remaining nodes, no
> service outage.
>
> When the node is re-started it is being re-synced quickly and service with
> three nodes is restored.
>
> Now we experienced a few galera cluster fails, which seem to happen this
> way:
> One of the nodes is getting a lot of load. DDOS Attacks, Memory Leaks or
> similar, which just renders the whole physical machine laggy for a short
> time. So the affected MariaDB node is being thrown out of the cluster by
> the two other nodes, probably for not syncing fast enough anymore.
>
> But as the node is not 'down' completely, it still accepts connections
> from the DB clients, but does not reply to them and seems to remain in a
> 'db locked' situation. Strangely this then also affects the two remaining
> nodes, who also go into 'locked' mode and do not reply to queries on the
> time expected by the application anymore. Of course this then causes more
> DB clients (IMAP, SMTP-Auth, etc) to spawn and to create DB connections
> worsening the whole situation.
>
> The situation seemingly can only be resolved by shuting down the MariaDB
> node that got thrown out of the cluster. Then the situations normalizes
> with the two remaining nodes and the third one can be restarted.
>
> Is this expected behaviour? Is there a way to tell a MariaDB node that got
> excluded from the cluster to shut himself down completely so it does NOT
> accept any more connections from clients, blocking the whole service?
>
> Regards
>
> -Benoît Panizzon-
> --
> I m p r o W a r e   A G    -    Leiter Commerce Kunden
> ______________________________________________________
>
> Zurlindenstrasse 29             Tel  +41 61 826 93 00
> CH-4133 Pratteln                Fax  +41 61 826 93 01
> Schweiz                         Web  http://www.imp.ch
> ______________________________________________________
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
> NetNames, 25 Canada Square, Canary Wharf, London E14 5LQ, UK | Tel: +44
> 207 015 9200 | NetNames Limited, Registered in England and Wales, Company
> number: 3169594, VAT Number: GB 739633893
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
-- 
Guillaume Lefranc
Remote DBA Services Manager
MariaDB Corporation

References