← Back to team overview

maria-discuss team mailing list archive

backup from any cluster member can't be restarted and restored

 

This has biting me for a bit, and i have tried a litany of things.  Here is
the issue:
VPC 1: (prod)
Mariadb 10.4.12 - galera in a 3 node cluster, cluster functioning perfectly

VPC 2: (staging)
Mariadb 10.4.12 galera in a 3 node cluster also running smoothly.

-- backup script (clipped for only the essentials) --
set -e
DIR="/home/jeff/restore"
rm -rf $DIR
mkdir $DIR
AUTH='--defaults-extra-file=/root/db.cnf -u root'
sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = ON;"
sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = OFF";
sudo mariabackup ${AUTH} --backup --binlog-info=ON --target-dir="${DIR}"
2>&1 | grep -vE "Copying|done|log scanned"
sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = ON;"
sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = OFF";

-- end script --

-- restore process --
sudo service mariadb stop (or kill -9 PID only when the server won't start)
sudo rm -rf /var/lib/mysl/*
sudo mariabackup --prepare --target-dir=restore
sudo mariabackup --copy-back --target-dir=restore
sudo chown -R mysql:mysql /var/lib/mysql/*
sudo service mariadb start
-- end restore --

I can run this on any of the script and backup process on any of the
*staging* nodes and restore them to a freshly built server within that
VPC.  I can also take the backup from *staging* and load in into a newly
build machine on the *prod* VPC.

What i can not do is run the script on any of the prod members and restore
it to a newly built server, regardless of location.  (really the locations
don't matter at all)

The following is the log from a failing restore from a proddb backup and
restore:
2020-04-15 22:11:10 0 [Note] InnoDB: Using Linux native AIO
2020-04-15 22:11:10 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic
builtins
2020-04-15 22:11:10 0 [Note] InnoDB: Uses event mutexes
2020-04-15 22:11:10 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-04-15 22:11:10 0 [Note] InnoDB: Number of pools: 1
2020-04-15 22:11:10 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-04-15 22:11:10 0 [Note] mysqld: O_TMPFILE is not supported on /tmp
(disabling future attempts)
2020-04-15 22:11:10 0 [Note] InnoDB: Initializing buffer pool, total size =
2G, instances = 8, chunk size = 128M
2020-04-15 22:11:10 0 [Note] InnoDB: Completed initialization of buffer pool
2020-04-15 22:11:10 0 [Note] InnoDB: If the mysqld execution user is
authorized, page cleaner thread priority can be changed. See the man page
of setpriority().
2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file ./ib_logfile101 size
to 50331648 bytes
2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to
50331648 bytes
2020-04-15 22:11:10 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to
./ib_logfile0
2020-04-15 22:11:10 0 [Note] InnoDB: New log files created, LSN=42006460184
2020-04-15 22:11:10 0 [Note] InnoDB: 128 out of 128 rollback segments are
active.
2020-04-15 22:11:10 0 [Note] InnoDB: Creating shared tablespace for
temporary tables
2020-04-15 22:11:10 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB.
Physically writing the file full; Please wait ...
2020-04-15 22:11:10 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-04-15 22:11:10 0 [Note] InnoDB: Waiting for purge to start
2020-04-15 22:11:10 0 [Note] InnoDB: 10.4.12 started; log sequence number
42006460428; transaction id 23858673
2020-04-15 22:11:10 0 [Note] InnoDB: Loading buffer pool(s) from
/var/lib/mysql/ib_buffer_pool
2020-04-15 22:11:10 0 [Note] Plugin 'FEEDBACK' is disabled.
2020-04-15 22:11:11 0 [Note] InnoDB: Buffer pool(s) load completed at
200415 22:11:11
-- hangs here for about an hour before timing out --


Logfile from a successful start

2020-04-15 23:13:15 0 [Note] InnoDB: Using Linux native AIO
2020-04-15 23:13:15 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic
builtins
2020-04-15 23:13:15 0 [Note] InnoDB: Uses event mutexes
2020-04-15 23:13:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-04-15 23:13:15 0 [Note] InnoDB: Number of pools: 1
2020-04-15 23:13:15 0 [Note] InnoDB: Using SSE2 crc32 instructions
2020-04-15 23:13:15 0 [Note] mysqld: O_TMPFILE is not supported on /tmp
(disabling future attempts)
2020-04-15 23:13:15 0 [Note] InnoDB: Initializing buffer pool, total size =
2G, instances = 8, chunk size = 128M
2020-04-15 23:13:16 0 [Note] InnoDB: Completed initialization of buffer pool
2020-04-15 23:13:16 0 [Note] InnoDB: If the mysqld execution user is
authorized, page cleaner thread priority can be changed. See the man page
of setpriority().
2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file ./ib_logfile101 size
to 50331648 bytes
2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to
50331648 bytes
2020-04-15 23:13:16 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to
./ib_logfile0
2020-04-15 23:13:16 0 [Note] InnoDB: New log files created, LSN=43323668550
2020-04-15 23:13:16 0 [Note] InnoDB: 128 out of 128 rollback segments are
active.
2020-04-15 23:13:16 0 [Note] InnoDB: Creating shared tablespace for
temporary tables
2020-04-15 23:13:16 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB.
Physically writing the file full; Please wait ...
2020-04-15 23:13:16 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-04-15 23:13:16 0 [Note] InnoDB: Waiting for purge to start
2020-04-15 23:13:16 0 [Note] InnoDB: 10.4.12 started; log sequence number
43323669004; transaction id 22422468
2020-04-15 23:13:16 0 [Note] InnoDB: Loading buffer pool(s) from
/var/lib/mysql/ib_buffer_pool
2020-04-15 23:13:16 0 [Note] Plugin 'FEEDBACK' is disabled.
2020-04-15 23:13:16 0 [Note] Server socket created on IP: '0.0.0.0'.
2020-04-15 23:13:16 0 [Note] Reading of all Master_info entries succeeded
2020-04-15 23:13:16 0 [Note] Added new Master_info '' to hash table
2020-04-15 23:13:16 0 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.4.12-MariaDB-1:10.4.12+maria~bionic'  socket:
'/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed at
200415 23:13:22

------------------------
The main difference between the two is that:
- the server socket is logged
- master_info is logged
- and it binds to the port/socket

Before they *both* end with
2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed at
200415 23:13:22

-- Other Notes --
- Config files are the same except for ips, both on newly built servers as
the cluster as they are managed through salt.  I ran a diff -r against prod
vs staging and only saw

There has to be some corruption in the mariabackup output, but still
reported OK.  Since i can put a staging backup on a prod server, but i can
not put a prod backup on a staging server.

If you've gotten this far, thank you.  If you have any ideas on what i
could try, i'd be grateful.  aria_chk and all other table checks return
correctly.  I'm not sure what makes a server write out the master_info, so
possibly that is a pointer.

I'm happy to answer any questions, i've tried way to many things to post
here as i didn't want to write until i had enough of dumb stuff figured
out.  I could rebuild the prod cluster but am really looking to understand
this.  I don't really want to build my own with debug info on b/c why would
they be different between the two vpcs.

Follow ups