maria-discuss team mailing list archive

Thread
Date

Galera partitioning question

To: maria-discuss@xxxxxxxxxxxxxxxxxxx
From: Ryan Delgrosso <ryandelgrosso@xxxxxxxxx>
Date: Sun, 29 Jul 2018 11:30:09 -0700
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Greetings all,

I am running MariaDB 10.2.16 on CentOS in AWS and am seeing a sporadiccluster partitioning and rejoining issue with seemingly no explicable cause.


 * I have elements in 3 different AWS availability zones in a single
   galera cluster
 * Monitoring logs I see this message: /Jul 29 05:33:53 server01
   mysqld: 2018-07-29  5:33:53 139633883080448 [Note] WSREP: (eabb848a,
   'tcp://0.0.0.0:4567') connection to peer 392b9516 with addr
   tcp://172.31.17.60:4567 timed out, no messages seen in PT3S/
 * I have tried forcing a 1500byte MTU as some others sources mentioned
   jumbo framing could negatively impact galera replication.
 * Running prolonged packet captures between nodes i cannot seem to
   find anything else wrong, network connectivity isn't interrupted and
   no service restarts occur.
 * These partition events happen multiple times per day.

Has anyone seem this sporadic cluster disconnect and re-join issue in asimilar env? I did not previously note this behavior on 10.1.


Any help is much appreciated.

-Ryan