← Back to team overview

maria-discuss team mailing list archive

Galera partitioning question

 

Greetings all,

I am running MariaDB 10.2.16 on CentOS in AWS and am seeing a sporadic cluster partitioning and rejoining issue with seemingly no explicable cause.

 * I have elements in 3 different AWS availability zones in a single
   galera cluster
 * Monitoring logs I see this message: /Jul 29 05:33:53 server01
   mysqld: 2018-07-29  5:33:53 139633883080448 [Note] WSREP: (eabb848a,
   'tcp://0.0.0.0:4567') connection to peer 392b9516 with addr
   tcp://172.31.17.60:4567 timed out, no messages seen in PT3S/
 * I have tried forcing a 1500byte MTU as some others sources mentioned
   jumbo framing could negatively impact galera replication.
 * Running prolonged packet captures between nodes i cannot seem to
   find anything else wrong, network connectivity isn't interrupted and
   no service restarts occur.
 * These partition events happen multiple times per day.

Has anyone seem this sporadic cluster disconnect and re-join issue in a similar env? I did not previously note this behavior on 10.1.

Any help is much appreciated.

-Ryan