← Back to team overview

maria-discuss team mailing list archive

Re: Rolling upgrade to Galera 10.4 not working

 

Anyone?

On Thu, Jun 20, 2019 at 5:59 PM Tiemen Ruiten <t.ruiten@xxxxxxxxxxx> wrote:

> Hello,
>
> I'm trying to do a rolling upgrade of a MariaDB 10.3 Galera cluster to to
> 10.4, following the instructions here
> <https://mariadb.com/kb/en/library/upgrading-from-mariadb-103-to-mariadb-104-with-galera-cluster/>.
> (I had to adjust the wsrep_provider variable for Galera 4
> to /usr/lib64/galera-4/libgalera_smm.so, not mentioned in the
> instructions). After I start the mariadb service, one mysqld thread uses
> 100% of one core for about 45 minutes, then I see the following in the log
> and mariadb restarts, repeating the same process:
>
> Jun 20 16:24:44 rubidium mysqld: 2019-06-20 16:24:44 0 [Note] WSREP:
> Trying to continue unpaused monitor
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: REPL
> Protocols: 9 (4, 2)
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
> ####### My UUID: ddb4f4da-935f-11e9-bcff-8209857266f0
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
> Server mariadb4 connected to cluster at position
> 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 with ID
> ddb4f4da-935f-11e9-bcff-8209857266f0
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
> Server status change disconnected -> connected
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
> wsrep_notify_cmd is not defined, skipping notification.
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP: State
> transfer required:
> Jun 20 16:24:47 rubidium mysqld: Group state:
> 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644
> Jun 20 16:24:47 rubidium mysqld: Local state:
> 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
> Server status change connected -> joiner
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 3 [Note] WSREP:
> wsrep_notify_cmd is not defined, skipping notification.
> Jun 20 16:24:47 rubidium mysqld: 2019-06-20 16:24:47 0 [Note] WSREP:
> Running: 'wsrep_sst_rsync --role 'joiner' --address '10.100.130.24'
> --datadir '/var/lib/mysql/'   --parent '21283'  ''  '''
> Jun 20 16:24:48 rubidium rsyncd[26235]: rsyncd version 3.1.2 starting,
> listening on port 4444
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
> Prepared SST request: rsync|10.100.130.24:4444/rsync_sst
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: Cert
> index reset to 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644 (proto: 9),
> state transfer needed: yes
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP:
> Service thread queue flushed.
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
> ####### Assign initial position for certification:
> 6a1fd0aa-4739-11e5-af88-42756e861245:7403466644, protocol version: 4
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
> ####### IST uuid:6a1fd0aa-4739-11e5-af88-42756e861245 f: 7403368280, l:
> 7403466644, STRv: 2
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP: IST
> receiver addr using tcp://10.100.130.24:4568
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [Note] WSREP:
> Prepared IST receiver for 7403368280-7403466644, listening at: tcp://
> 10.100.130.24:4568
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP:
> Member 2.0 (mariadb4) requested state transfer from 'mariadb2'. Selected
> 1.0 (mariadb2)(SYNCED) as donor.
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 0 [Note] WSREP:
> Shifting PRIMARY -> JOINER (TO: 7403946566)
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP:
> Slave queue grew too long while trying to request state transfer 1 time(s).
> Please make sure that there is at least one fully synced member in the
> group. Application must be restarted.
> Jun 20 16:24:48 rubidium mysqld: 2019-06-20 16:24:48 3 [ERROR] WSREP:
> State transfer request failed unrecoverably: 35 (Resource deadlock
> avoided). Most likely it is due to inability to communicate with the
> cluster primary component. Restart required.
>
> On the donor, I see:
>
> 2019-06-20 16:24:44 0 [Note] WSREP: Trying to continue unpaused monitor
> 2019-06-20 16:24:48 0 [Note] WSREP: Member 2.0 (mariadb4) requested state
> transfer from 'mariadb2'. Selected 1.0 (mariadb2)(SYNCED) as donor.
> 2019-06-20 16:24:48 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO:
> 7403946566)
> 2019-06-20 16:24:48 18 [Note] WSREP: IST request:
> 6a1fd0aa-4739-11e5-af88-42756e861245:7403368279-7403466644|tcp://
> 10.100.130.24:4568
> 2019-06-20 16:24:48 18 [Note] WSREP: wsrep_notify_cmd is not defined,
> skipping notification.
> 2019-06-20 16:24:48 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role
> 'donor' --address '10.100.130.24:4444/rsync_sst' --socket
> '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     '' --gtid
> '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0'
> --bypass'
> 2019-06-20 16:24:48 18 [Note] WSREP: sst_donor_thread signaled with 0
> 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender starting to serve
> tcp://10.100.130.24:4568 sending 7403368280-7403466644
> 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567')
> turning message relay requesting on, nonlive peers: tcp://
> 10.100.130.24:4567
> 2019-06-20 16:24:48 0 [Note] WSREP: declaring 116c0b4c at tcp://
> 10.100.130.22:4567 stable
> 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp://
> 10.100.130.24:4567)
> 2019-06-20 16:24:48 0 [Note] WSREP: (700b363f, 'tcp://0.0.0.0:4567')
> turning message relay requesting off
> 2019-06-20 16:24:48 0 [Note] WSREP: Node 116c0b4c state prim
> 2019-06-20 16:24:48 0 [Note] WSREP: view(view_id(PRIM,116c0b4c,297) memb {
>         116c0b4c,0
>         700b363f,0
> } joined {
> } left {
> } partitioned {
>         ddb4f4da,0
> })
> 2019-06-20 16:24:48 0 [Note] WSREP: save pc into disk
> 2019-06-20 16:24:48 0 [Note] WSREP: forgetting ddb4f4da (tcp://
> 10.100.130.24:4567)
> 2019-06-20 16:24:48 0 [Note] WSREP: New COMPONENT: primary = yes,
> bootstrap = no, my_idx = 1, memb_num = 2
> 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
> 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: sent state msg:
> 286f5ce4-9367-11e9-acee-4e4dec61d7c1
> 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg:
> 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 0 (mariadb1)
> 2019-06-20 16:24:48 0 [Note] WSREP: STATE EXCHANGE: got state msg:
> 286f5ce4-9367-11e9-acee-4e4dec61d7c1 from 1 (mariadb2)
> 2019-06-20 16:24:48 0 [Note] WSREP: Quorum results:
>         version    = 4,
>         component  = PRIMARY,
>         conf_id    = 269,
>         members    = 2/2 (joined/total),
>         act_id     = 7403946566,
>         last_appl. = 7403946444,
>         protocols  = 0/9/3 (gcs/repl/appl),
>         group UUID = 6a1fd0aa-4739-11e5-af88-42756e861245
> 2019-06-20 16:24:48 0 [Note] WSREP: Flow-control interval: [23, 23]
> 2019-06-20 16:24:48 0 [Note] WSREP: Trying to continue unpaused monitor
> 2019-06-20 16:24:48 14 [Note] WSREP: New cluster view: global state:
> 6a1fd0aa-4739-11e5-af88-42756e861245:7403946566, view# 270: Primary, number
> of nodes: 2, my index: 1, protocol version 3
> 2019-06-20 16:24:48 14 [Note] WSREP: wsrep_notify_cmd is not defined,
> skipping notification.
> 2019-06-20 16:24:48 14 [Note] WSREP: REPL Protocols: 9 (4, 2)
> 2019-06-20 16:24:48 14 [Note] WSREP: Assign initial position for
> certification: 7403946566, protocol version: 4
> 2019-06-20 16:24:48 0 [Note] WSREP: Service thread queue flushed.
> 2019-06-20 16:24:48 0 [ERROR] WSREP: async IST sender failed to serve
> tcp://10.100.130.24:4568: ist send failed: asio.system:104', asio error
> 'read: Connection reset by peer': 104 (Connection reset by peer)
>          at galera/src/ist.cpp:send():790
> 2019-06-20 16:24:48 0 [Note] WSREP: async IST sender served
> WSREP_SST: [INFO] Bypassing state dump. (20190620 16:24:48.149)
> rsync: safe_read failed to read 1 bytes [sender]: Connection reset by peer
> (104)
> rsync error: error in rsync protocol data stream (code 12) at io.c(276)
> [sender=3.1.2]
> 2019-06-20 16:24:48 0 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync
> --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket
> '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     '' --gtid
> '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0'
> --bypass
> 2019-06-20 16:24:48 0 [ERROR] WSREP: Process completed with error:
> wsrep_sst_rsync --role 'donor' --address '10.100.130.24:4444/rsync_sst'
> --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     ''
> --gtid '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id
> '0' --bypass: 12 (Cannot allocate memory)
> 2019-06-20 16:24:48 0 [ERROR] WSREP: Command did not run: wsrep_sst_rsync
> --role 'donor' --address '10.100.130.24:4444/rsync_sst' --socket
> '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/'     '' --gtid
> '6a1fd0aa-4739-11e5-af88-42756e861245:7403368279' --gtid-domain-id '0'
> --bypass
> 2019-06-20 16:24:48 0 [Warning] WSREP: Could not find peer:
> ddb4f4da-935f-11e9-bcff-8209857266f0
> 2019-06-20 16:24:48 0 [Warning] WSREP: 1.0 (mariadb2): State transfer to
> -1.-1 (left the group) failed: -12 (Cannot allocate memory)
>
> What is going on?
>
>
>
>
>

References