← Back to team overview

maria-discuss team mailing list archive

Re: New Question: Simple cluster two nodes, error.

 

Hi,

1) There's no master and no slave, there's a totally different thing: Primary Component. It is the part of the cluster that is allowed to modify data. So you either start a new cluster or join a node to the primary component. If you join a node to a non-primary component, then it is in non-primary component and that's what you get.

131211 20:05:05 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)

by default it refuses to stay there and shuts down (so that you have a chance to connect it to primary).

2) If the cluster partitions (connection between nodes lost) primary becomes the component with ABSOLUTE majority of nodes. In your case of 2 nodes there can be no absolute majority. When one node dies the other finds itself in a so-called "split-brain" situation and becomes non-primary. That's what seem to have happened in your situation and that's the reason to require 3 nodes.

3) There are different ways to get out of this situation:
- restart the cluster from scratch.
- SET GLOBAL wsrep_provider_options='pc.bootstrap=1'
- assign a greater weight to one of the nodes (beforehand)
- force a node to ignore split-brain (beforehand)
All have different tradeoffs. I'd recommend the second option.

Regards,
Alex

On 2013-12-11 22:30, AskMonty KB wrote:
Hello,

A new question has been asked in "MariaDB FAQ" by maximilianodipietro:
--------------------------------
Hi people, i have a two nodes cluster of MariaDb, and i need some
help, i have this simple script to do a solve failover, the script
works like this: if the master node dies, the script kills the slave
and restart it as master, and then the master as slave when the script
detects that the process is down, here's the code:

----------------------------------------------------------------------------------------------------------------------------------
#!/bin/bash

if [ $1 = "start" ]
then

PidMaria=$(ps -ef | grep wsrep_cluster | grep -v grep | awk {'print $2'})
Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'})
TipoPid=$(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep | cut -d "="
-f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1")
TipoPidLocal=$(ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 |
cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1)

if [ -z $PidMaria ]
then
#sleep 20
Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'})
        if [ -z $Server2 ]
        then
                        if [ -z $TipoPid ]
                        then
                                if [ -z $(ssh ipnodo2 "ps -ef |grep
mysql | grep -v grep " | awk {'print $2'}) ]
                                then
                        /usr/sbin/mysqld
--wsrep_cluster_address=gcomm:// --user=mysql
--wsrep_sst_auth=root:root
--wsrep_provider=/usr/lib64/galera/libgalera_smm.so
                                fi
                        else
                        /usr/sbin/mysqld
--wsrep_cluster_address=gcomm://ipnodo2 --wsrep_sst_auth=root:root
--user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
                        fi
        else
                        if [ -z $TipoPid ]
                        then
        /usr/sbin/mysqld --wsrep_cluster_address=gcomm://ipnodo2
--user=mysql --wsrep_sst_auth=root:root
--wsrep_provider=/usr/lib64/galera/libgalera_smm.so
                       fi
       fi

else
        if [ -z $Server2 ]
        then
                if [ -z $TipoPidLocal ]
                then
                echo "hola"
                else
                kill -9 $PidMaria
                /usr/sbin/mysqld --wsrep_cluster_address=gcomm://
--wsrep_sst_auth=root:root --user=mysql
--wsrep_provider=/usr/lib64/galera/libgalera_smm.so
                fi
        else
                if [ -z $PidMaria ]
                then
                /usr/sbin/mysqld
--wsrep_cluster_address=gcomm://ipnodo2 --user=mysql
--wsrep_sst_auth=root:root
--wsrep_provider=/usr/lib64/galera/libgalera_smm.so
        fi
        fi

fi
fi
--------------------------------------------------------------------------------------------------------------------------------





This is the error i get in the slave output when i run the script by hand.

---------------------------------------------------------------------------------------------------------------------------
+ /usr/sbin/mysqld --wsrep_cluster_address=gcomm://IPMASTER
--user=mysql --wsrep_sst_auth=root:root
--wsrep_provider=/usr/lib64/galera/libgalera_smm.so
131211 20:04:35 [Note] WSREP: Read nil XID from storage engines,
skipping position init
131211 20:04:35 [Note] WSREP: wsrep_load(): loading provider library
'/usr/lib64/galera/libgalera_smm.so'
131211 20:04:35 [Note] WSREP: wsrep_load(): Galera 23.2.7(r157) by
Codership Oy <info@xxxxxxxxxxxxx> loaded succesfully.
131211 20:04:35 [Note] WSREP: Found saved state:
1503cc31-6281-11e3-abfc-5bf96ca010d8:-1
131211 20:04:35 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
131211 20:04:35 [Note] WSREP: Passing config to GCS: base_host =
IPLOCALHOST; base_port = 4567; cert.log_conflicts = no; gcache.dir =
/var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0;
gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M;
gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit
= 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807;
gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO;
replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
131211 20:04:35 [Note] WSREP: Assign initial position for
certification: -1, protocol version: -1
131211 20:04:35 [Note] WSREP: wsrep_sst_grab()
131211 20:04:35 [Note] WSREP: Start replication
131211 20:04:35 [Note] WSREP: Setting initial position to
00000000-0000-0000-0000-000000000000:-1
131211 20:04:35 [Note] WSREP: protonet asio version 0
131211 20:04:35 [Note] WSREP: backend: asio
131211 20:04:35 [Note] WSREP: GMCast version 0
131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6,
'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6,
'tcp://0.0.0.0:4567') multicast: , ttl: 1
131211 20:04:35 [Note] WSREP: EVS version 0
131211 20:04:35 [Note] WSREP: PC version 0
131211 20:04:35 [Note] WSREP: gcomm: connecting to group
'my_wsrep_cluster', peer '162.243.62.104:'
131211 20:04:35 [Note] WSREP: declaring
15031b4b-6281-11e3-9ad7-8a59b1f0cba0 stable
131211 20:04:35 [Note] WSREP:
view(view_id(NON_PRIM,15031b4b-6281-11e3-9ad7-8a59b1f0cba0,8) memb {
        15031b4b-6281-11e3-9ad7-8a59b1f0cba0,
        74e621f3-629f-11e3-a86f-0adb496b3ff6,
} joined {
} left {
} partitioned {
        18299990-6281-11e3-a268-975810126780,
        a6d47aea-6281-11e3-b007-16991d53e685,
        d52af178-6281-11e3-871d-03c1e15f1ac4,
})
131211 20:05:05 [ERROR] WSREP: failed to open gcomm backend
connection: 110: failed to reach primary view: 110 (Connection timed
out)
         at gcomm/src/pc.cpp:connect():139
131211 20:05:05 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195:
Failed to open backend connection: -110 (Connection timed out)
131211 20:05:05 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1289: Failed
to open channel 'my_wsrep_cluster' at 'gcomm://IPMASTER': -110
(Connection timed out)
131211 20:05:05 [ERROR] WSREP: gcs connect failed: Connection timed out
131211 20:05:05 [ERROR] WSREP: wsrep::connect() failed: 6
131211 20:05:05 [ERROR] Aborting

131211 20:05:05 [Note] WSREP: Service disconnected.
131211 20:05:06 [Note] WSREP: Some threads may fail to exit.
131211 20:05:06 [Note] /usr/sbin/mysqld: Shutdown complete
------------------------------------------------------------------------------------------------



When the script starts the master from the crontab, the PID is already
up, and i can access to the mariadb, but i cant write any database,
only list the databases.


[root@xxx ~]# mysql -uxxx -pxxxx
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 24
Server version: 5.5.33a-MariaDB

Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create database TESTEOKILL2;
ERROR 1047 (08S01): Unknown command
MariaDB [(none)]>


Any idea?. Thanks in advance!
--------------------------------

To view or answer this question please visit:
http://mariadb.com/kb/en/simple-cluster-two-nodes-error/

_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011


References