← Back to team overview

maria-discuss team mailing list archive

Rolling upgrade to Galera 10.4 not working

 

Hello,

I'm trying to do a rolling upgrade of a MariaDB 10.3 Galera cluster to to
10.4, following the instructions here
<https://mariadb.com/kb/en/library/upgrading-from-mariadb-103-to-mariadb-104-with-galera-cluster/>.
(I had to adjust the wsrep_provider variable for Galera 4
to /usr/lib64/galera-4/libgalera_smm.so, not mentioned in the
instructions). After I start the mariadb service, one mysqld thread uses
100% of one core for about 45 minutes, then I see the following in the log
and mariadb restarts, repeating the same process:

Jun 20 16:24:44 rubidium mysqld: 2019-06-20 16:24:44 0 [Note] WSREP: Trying
to continue unpaused monitor
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: REPL
Protocols: 9 (4, 2)
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
####### My UUID: ddb4f4da-935f-11e9-bcff-8209857266f0
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server
mariadb4 connected to cluster at position
6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 with ID
ddb4f4da-935f-11e9-bcff-8209857266f0
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server
status change disconnected -> connected
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
wsrep_notify_cmd is not defined, skipping notification.
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: State
transfer required:
Jun 20 16:24:47 rubidium mysqld: Group state:
6a1fd0aa-4739-11e5-af88-42756e861245:7403466644
Jun 20 16:24:47 rubidium mysqld: Local state:
6a1fd0aa-4739-11e5-af88-42756e861245:7403368279
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: Server
status change connected -> joiner
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
wsrep_notify_cmd is not defined, skipping notification.
Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 0 [Note] WSREP:
Running: 'wsrep_sst_rsync --role 'joiner' --address '10.100.130.24'
--datadir '/var/lib/mysql/'   --parent '21283'  ''  '''
Jun 20 16:24:48 rubidium rsyncd[26235]: rsyncd version 3.1.2 starting,
listening on port 4444
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
Prepared SST request: rsync|10.100.130.24:4444/rsync_sst
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Cert
index reset to 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 (proto: 9),
state transfer needed: yes
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP:
Service thread queue flushed.
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
####### Assign initial position for certification:
6a1fd0aa-4739-11e5-af88-42756e861245:7403466644, protocol version: 4
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
####### IST uuid:6a1fd0aa-4739-11e5-af88-42756e861245 f: 7403368280, l:
7403466644, STRv: 2
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: IST
receiver addr using tcp://10.100.130.24:4568
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
Prepared IST receiver for 7403368280-7403466644, listening at: tcp://
10.100.130.24:4568
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP: Member
2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected 1.0
(mariadb2)(SYNCED) as donor.
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP:
Shifting PRIMARY -> JOINER (TO: 7403946566)
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: Slave
queue grew too long while trying to request state transfer 1 time(s).
Please make sure that there is at least one fully synced member in the
group. Application must be restarted.
Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP: State
transfer request failed unrecoverably: 35 (Resource deadlock avoided). Most
likely it is due to inability to communicate with the cluster primary
component. Restart required.

On the donor, I see:

2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor
2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state
transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor.
2019-06-20 16:24:48 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO:
7403946566)
2019-06-20 16:24:48 18 [Note] WSREP: IST request:
6a1fd0aa-4739-11e5-af88-42756e861245:7403368279-7403466644|tcp://
10.100.130.24:4568
2019-06-20 16:24:48 18 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
2019-06-20 16:24:48 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role
'donor' --address '10.100.130.24:4444/rsync_sst' --socket
'/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     '' --gtid
'6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0'
--bypass'
2019-06-20 16:24:48 18 [Note] WSREP: sst_donor_thread signaled with 0
2019-06-20 16:24:48 0 [Note] WSREP: async IST sender starting to serve
tcp://10.100.130.24:4568 sending 7403368280-7403466644
2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567')
turning message relay requesting on, nonlive peers: tcp://10.100.130.24:4567
2019-06-20 16:24:48 0 [Note] WSREP: declaring 116c0b4c at tcp://
10.100.130.22:4567 stable
2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp://
10.100.130.24:4567)
2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567')
turning message relay requesting off
2019-06-20 16:24:48 0 [Note] WSREP: Node 116c0b4c state prim
2019-06-20 16:24:48 0 [Note] WSREP: view(view_id(PRIM,116c0b4c,297) memb {
        116c0b4c,0
        700b363f,0
} joined {
} left {
} partitioned {
        ddb4f4da,0
})
2019-06-20 16:24:48 0 [Note] WSREP: save pc into disk
2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp://
10.100.130.24:4567)
2019-06-20 16:24:48 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap
= no, my_idx = 1, memb_num = 2
2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: sent state msg:
286f5ce4-9367-11e9-acee-4e4dec61d7c1
2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg:
286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 0 (mariadb1)
2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg:
286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 1 (mariadb2)
2019-06-20 16:24:48 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 269,
        members    = 2/2 (joined/total),
        act_id     = 7403946566,
        last_appl. = 7403946444,
        protocols  = 0/9/3 (gcs/repl/appl),
        group UUID = 6a1fd0aa-4739-11e5-af88-42756e861245
2019-06-20 16:24:48 0 [Note] WSREP: Flow-control interval: [23, 23]
2019-06-20 16:24:48 0 [Note] WSREP: Trying to continue unpaused monitor
2019-06-20 16:24:48 14 [Note] WSREP: New cluster view: global state:
6a1fd0aa-4739-11e5-af88-42756e861245:7403946566, view# 270: Primary, number
of nodes: 2, my index: 1, protocol version 3
2019-06-20 16:24:48 14 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
2019-06-20 16:24:48 14 [Note] WSREP: REPL Protocols: 9 (4, 2)
2019-06-20 16:24:48 14 [Note] WSREP: Assign initial position for
certification: 7403946566, protocol version: 4
2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed.
2019-06-20 16:24:48 0 [ERROR] WSREP: async IST sender failed to serve tcp://
10.100.130.24:4568: ist send failed: asio.system:104', asio error 'read:
Connection reset by peer': 104 (Connection reset by peer)
         at galera/src/ist.cpp:send():790
2019-06-20 16:24:48 0 [Note] WSREP: async IST sender served
WSREP_SST: [INFO] Bypassing state dump. (20190620 16:24:48.149)
rsync: safe_read failed to read 1 bytes [sender]: Connection reset by peer
(104)
rsync error: error in rsync protocol data stream (code 12) at io.c(276)
[sender=3.1.2]
2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync
--role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket
'/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     '' --gtid
'6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0'
--bypass
2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error:
wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst'
--socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     ''
--gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id
'0' --bypass: 12 (Cannot allocate memory)
2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync
--role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket
'/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     '' --gtid
'6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0'
--bypass
2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer:
ddb4f4da-935f-11e9-bcff-8209857266f0
2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to
-1.-1 (left the group) failed: -12 (Cannot allocate memory)

What is going on?

Follow ups