maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #01210
Re: New Question: Simple cluster two nodes, error.
hum, must check but i think that galera cluster is 3+ nodes, i only
tested with 3+ here, didn't tested with 2 nodes yet
2013/12/11 AskMonty KB <noreply@xxxxxxxxxxxx>:
> Hello,
>
> A new question has been asked in "MariaDB FAQ" by maximilianodipietro:
> --------------------------------
> Hi people, i have a two nodes cluster of MariaDb, and i need some help, i have this simple script to do a solve failover, the script works like this: if the master node dies, the script kills the slave and restart it as master, and then the master as slave when the script detects that the process is down, here's the code:
>
> ----------------------------------------------------------------------------------------------------------------------------------
> #!/bin/bash
>
> if [ $1 = "start" ]
> then
>
> PidMaria=$(ps -ef | grep wsrep_cluster | grep -v grep | awk {'print $2'})
> Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'})
> TipoPid=$(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1")
> TipoPidLocal=$(ps -ef |grep mysql | grep -v grep | cut -d "=" -f 2 | cut -d "/" -f 3 | cut -d "-" -f 1 | cut -d "." -f 1)
>
> if [ -z $PidMaria ]
> then
> #sleep 20
> Server2=$(nmap -v ipnodo2 | grep 3306 | tail -n1 | awk {'print $2'})
> if [ -z $Server2 ]
> then
> if [ -z $TipoPid ]
> then
> if [ -z $(ssh ipnodo2 "ps -ef |grep mysql | grep -v grep " | awk {'print $2'}) ]
> then
> /usr/sbin/mysqld --wsrep_cluster_address=gcomm:// --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
> fi
> else
> /usr/sbin/mysqld --wsrep_cluster_address=gcomm://ipnodo2 --wsrep_sst_auth=root:root --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
> fi
> else
> if [ -z $TipoPid ]
> then
> /usr/sbin/mysqld --wsrep_cluster_address=gcomm://ipnodo2 --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
> fi
> fi
>
> else
> if [ -z $Server2 ]
> then
> if [ -z $TipoPidLocal ]
> then
> echo "hola"
> else
> kill -9 $PidMaria
> /usr/sbin/mysqld --wsrep_cluster_address=gcomm:// --wsrep_sst_auth=root:root --user=mysql --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
> fi
> else
> if [ -z $PidMaria ]
> then
> /usr/sbin/mysqld --wsrep_cluster_address=gcomm://ipnodo2 --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
> fi
> fi
>
> fi
> fi
> --------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
> This is the error i get in the slave output when i run the script by hand.
>
> ---------------------------------------------------------------------------------------------------------------------------
> + /usr/sbin/mysqld --wsrep_cluster_address=gcomm://IPMASTER --user=mysql --wsrep_sst_auth=root:root --wsrep_provider=/usr/lib64/galera/libgalera_smm.so
> 131211 20:04:35 [Note] WSREP: Read nil XID from storage engines, skipping position init
> 131211 20:04:35 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
> 131211 20:04:35 [Note] WSREP: wsrep_load(): Galera 23.2.7(r157) by Codership Oy <info@xxxxxxxxxxxxx> loaded succesfully.
> 131211 20:04:35 [Note] WSREP: Found saved state: 1503cc31-6281-11e3-abfc-5bf96ca010d8:-1
> 131211 20:04:35 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
> 131211 20:04:35 [Note] WSREP: Passing config to GCS: base_host = IPLOCALHOST; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
> 131211 20:04:35 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
> 131211 20:04:35 [Note] WSREP: wsrep_sst_grab()
> 131211 20:04:35 [Note] WSREP: Start replication
> 131211 20:04:35 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
> 131211 20:04:35 [Note] WSREP: protonet asio version 0
> 131211 20:04:35 [Note] WSREP: backend: asio
> 131211 20:04:35 [Note] WSREP: GMCast version 0
> 131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
> 131211 20:04:35 [Note] WSREP: (74e621f3-629f-11e3-a86f-0adb496b3ff6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
> 131211 20:04:35 [Note] WSREP: EVS version 0
> 131211 20:04:35 [Note] WSREP: PC version 0
> 131211 20:04:35 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '162.243.62.104:'
> 131211 20:04:35 [Note] WSREP: declaring 15031b4b-6281-11e3-9ad7-8a59b1f0cba0 stable
> 131211 20:04:35 [Note] WSREP: view(view_id(NON_PRIM,15031b4b-6281-11e3-9ad7-8a59b1f0cba0,8) memb {
> 15031b4b-6281-11e3-9ad7-8a59b1f0cba0,
> 74e621f3-629f-11e3-a86f-0adb496b3ff6,
> } joined {
> } left {
> } partitioned {
> 18299990-6281-11e3-a268-975810126780,
> a6d47aea-6281-11e3-b007-16991d53e685,
> d52af178-6281-11e3-871d-03c1e15f1ac4,
> })
> 131211 20:05:05 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
> at gcomm/src/pc.cpp:connect():139
> 131211 20:05:05 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
> 131211 20:05:05 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1289: Failed to open channel 'my_wsrep_cluster' at 'gcomm://IPMASTER': -110 (Connection timed out)
> 131211 20:05:05 [ERROR] WSREP: gcs connect failed: Connection timed out
> 131211 20:05:05 [ERROR] WSREP: wsrep::connect() failed: 6
> 131211 20:05:05 [ERROR] Aborting
>
> 131211 20:05:05 [Note] WSREP: Service disconnected.
> 131211 20:05:06 [Note] WSREP: Some threads may fail to exit.
> 131211 20:05:06 [Note] /usr/sbin/mysqld: Shutdown complete
> ------------------------------------------------------------------------------------------------
>
>
>
> When the script starts the master from the crontab, the PID is already up, and i can access to the mariadb, but i cant write any database, only list the databases.
>
>
> [root@xxx ~]# mysql -uxxx -pxxxx
> Welcome to the MariaDB monitor. Commands end with ; or \g.
> Your MariaDB connection id is 24
> Server version: 5.5.33a-MariaDB
>
> Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
>
> Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
>
> MariaDB [(none)]> create database TESTEOKILL2;
> ERROR 1047 (08S01): Unknown command
> MariaDB [(none)]>
>
>
> Any idea?. Thanks in advance!
> --------------------------------
>
> To view or answer this question please visit: http://mariadb.com/kb/en/simple-cluster-two-nodes-error/
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help : https://help.launchpad.net/ListHelp
--
Roberto Spadim
SPAEmpresarial
References