maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #05744
Re: backup from any cluster member can't be restarted and restored
That second note is horribly misleading (note the bold):
-- orig --
- I was able to take a full physical backup that runs every morning from
the VPC 1 (prod cluster) directly from S3 and restore it to the server
where i can not restore backups of any of the prod nodes. After this
successfully started, i ran mysqlcheck --all-databases (online) againh mst
this newly restored database and there was no corruption.
-- should be --
- I was able to take the full physical backup *from a slave whose master is
a prod cluster node,* which runs every morning, from S3 to the server where
i can not restore backups of any of the prod nodes. After this
successfully started, i ran mysqlcheck --all-databases (online) againh mst
this newly restored database and there was no corruption.
Sorry and Thanks
On Mon, Apr 27, 2020 at 2:38 PM Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
> Hello again peoples. I'm still at it, but have taken some time for higher
> priority projects and now have some follow up data, not sure if i have any
> questions, we'll see.
>
> in the staging VPC, referred above/below as VPC 2. I can restore any
> combinations of tables right down to only selecting mysql and
> performance_schema. In the prod VPC (VPC 1). I can not install any
> combination of databases. So like was thought before, i may have
> corruption in one of the tables, but....
>
> - i have checked all tables using innochecksum (offline), aria_chk
> (offline) and there are no indications of corruption
> - I was able to take a full physical backup that runs every morning from
> the VPC 1 (prod cluster) directly from S3 and restore it to the server
> where i can not restore backups of any of the prod nodes. After this
> successfully started, i ran mysqlcheck --all-databases (online) againh mst
> this newly restored database and there was no corruption.
> - if i take the production node, that I'm using for backup testing, out of
> the cluster and then run mysqlcheck --all-databases and put it back on the
> cluster, no corrupt tables are reported.
>
> I would prefer not to build an entire new cluster, but realize i could
> have done so by now. b/c i would really like to know why and with the
> backups and snapshots that i can use the slave for along with 3 cluster
> nodes, my data is safe and being written to daily.
>
> All of these databases where upgraded from 10.3 to 10.4 (in the last two
> months) according to the instructions on mariadb.com. I ran staging
> first, let it hang a couple weeks and then upgraded production and both are
> at the same exact release. Ver 10.4.12-MariaDB-1:10.4.12+maria~bionic-log
> for debian-linux-gnu on x86_64 (mariadb.org binary distribution)
>
> I've considered compiling a debug enabled version of mariadb, but that
> seems it would only be useful if it were on the galera nodes, and i'm not
> too keen on doing that.
>
> Any and all thoughts are welcome, i didn't want to repeat any information
> so for new readers you'll have to read from the start of the thread and i
> do indeed appreciate it.
>
> Jeff
>
>
>
>
>
>
>
> On Sat, Apr 18, 2020 at 1:22 PM Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>
>> Hi Karl, someone else just saw that as well. it was indeed a type in the
>> email. To answer your questions; the plugins are the same sql_error_log
>> and socket_plugin (which is no longer needed in 10.4 and was removed after
>> the upgrade). Also since i can restore a new node from a staging backup,
>> in either VPC.
>>
>> Databases are a good question, no i have not. Perhaps i should try a
>> complete set of combinations, which may be a better use of time than
>> brining the cluster up to 5 and then seeing if i can take a physical backup
>> of one of the new nodes.
>>
>> On Sat, Apr 18, 2020 at 1:11 PM Karl Erik Levik <karl.levik@xxxxxxxxx>
>> wrote:
>>
>>> Well, what I mean is, you're missing the "q" in "mysql", so I was
>>> thinking perhaps that command doesn't actually do what you think it does?
>>>
>>> If that's not the issue, then I guess you have to look again at your
>>> prod nodes and figure out what, if anything, is special/unusual about them.
>>> Do you have any special plugins enabled? Any special storage engines used
>>> that you don't have on staging? (Wouldn't think so since only InnoDB is
>>> properly supported by Galera ...) Do you have any crud in your datadir that
>>> might create problems for mariabackup?
>>>
>>> Have you tried doing a partial backup/restore instead, i.e. only some of
>>> the databases?
>>>
>>> Cheers,
>>> Karl
>>>
>>> On Sat, 18 Apr 2020 at 17:25, Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>>>
>>>> Thanks Karl, that is just there so everything under mysql is deleted
>>>> and not the directory itself as mariabackup needs the datadir clean. And
>>>> i'm using the same "script" on both VPCs.
>>>>
>>>> Appreciate the read!
>>>>
>>>> On Sat, Apr 18, 2020 at 6:01 AM Karl Erik Levik <karl.levik@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> Probably not helpful, but I dunno if this is a typo just in your email
>>>>> or also in the script?
>>>>>
>>>>> > sudo rm -rf /var/lib/mysl/*
>>>>>
>>>>> Cheers,
>>>>> Karl
>>>>>
>>>>> On Thu, 16 Apr 2020 at 00:47, Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>>>>>
>>>>>> This has biting me for a bit, and i have tried a litany of things.
>>>>>> Here is the issue:
>>>>>> VPC 1: (prod)
>>>>>> Mariadb 10.4.12 - galera in a 3 node cluster, cluster functioning
>>>>>> perfectly
>>>>>>
>>>>>> VPC 2: (staging)
>>>>>> Mariadb 10.4.12 galera in a 3 node cluster also running smoothly.
>>>>>>
>>>>>> -- backup script (clipped for only the essentials) --
>>>>>> set -e
>>>>>> DIR="/home/jeff/restore"
>>>>>> rm -rf $DIR
>>>>>> mkdir $DIR
>>>>>> AUTH='--defaults-extra-file=/root/db.cnf -u root'
>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = ON;"
>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = OFF";
>>>>>> sudo mariabackup ${AUTH} --backup --binlog-info=ON
>>>>>> --target-dir="${DIR}" 2>&1 | grep -vE "Copying|done|log scanned"
>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = ON;"
>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = OFF";
>>>>>>
>>>>>> -- end script --
>>>>>>
>>>>>> -- restore process --
>>>>>> sudo service mariadb stop (or kill -9 PID only when the server won't
>>>>>> start)
>>>>>> sudo rm -rf /var/lib/mysl/*
>>>>>> sudo mariabackup --prepare --target-dir=restore
>>>>>> sudo mariabackup --copy-back --target-dir=restore
>>>>>> sudo chown -R mysql:mysql /var/lib/mysql/*
>>>>>> sudo service mariadb start
>>>>>> -- end restore --
>>>>>>
>>>>>> I can run this on any of the script and backup process on any of the
>>>>>> *staging* nodes and restore them to a freshly built server within
>>>>>> that VPC. I can also take the backup from *staging* and load in
>>>>>> into a newly build machine on the *prod* VPC.
>>>>>>
>>>>>> What i can not do is run the script on any of the prod members and
>>>>>> restore it to a newly built server, regardless of location. (really the
>>>>>> locations don't matter at all)
>>>>>>
>>>>>> The following is the log from a failing restore from a proddb backup
>>>>>> and restore:
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Using Linux native AIO
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Mutexes and rw_locks use GCC
>>>>>> atomic builtins
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Uses event mutexes
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Number of pools: 1
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Using SSE2 crc32 instructions
>>>>>> 2020-04-15 22:11:10 0 [Note] mysqld: O_TMPFILE is not supported on
>>>>>> /tmp (disabling future attempts)
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Initializing buffer pool, total
>>>>>> size = 2G, instances = 8, chunk size = 128M
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Completed initialization of
>>>>>> buffer pool
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: If the mysqld execution user is
>>>>>> authorized, page cleaner thread priority can be changed. See the man page
>>>>>> of setpriority().
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file ./ib_logfile101
>>>>>> size to 50331648 bytes
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file ./ib_logfile1
>>>>>> size to 50331648 bytes
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Renaming log file
>>>>>> ./ib_logfile101 to ./ib_logfile0
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: New log files created,
>>>>>> LSN=42006460184
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: 128 out of 128 rollback segments
>>>>>> are active.
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Creating shared tablespace for
>>>>>> temporary tables
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Setting file './ibtmp1' size to
>>>>>> 12 MB. Physically writing the file full; Please wait ...
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: File './ibtmp1' size is now 12
>>>>>> MB.
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Waiting for purge to start
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: 10.4.12 started; log sequence
>>>>>> number 42006460428; transaction id 23858673
>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Loading buffer pool(s) from
>>>>>> /var/lib/mysql/ib_buffer_pool
>>>>>> 2020-04-15 22:11:10 0 [Note] Plugin 'FEEDBACK' is disabled.
>>>>>> 2020-04-15 22:11:11 0 [Note] InnoDB: Buffer pool(s) load completed at
>>>>>> 200415 22:11:11
>>>>>> -- hangs here for about an hour before timing out --
>>>>>>
>>>>>>
>>>>>> Logfile from a successful start
>>>>>>
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Using Linux native AIO
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Mutexes and rw_locks use GCC
>>>>>> atomic builtins
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Uses event mutexes
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Number of pools: 1
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Using SSE2 crc32 instructions
>>>>>> 2020-04-15 23:13:15 0 [Note] mysqld: O_TMPFILE is not supported on
>>>>>> /tmp (disabling future attempts)
>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Initializing buffer pool, total
>>>>>> size = 2G, instances = 8, chunk size = 128M
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Completed initialization of
>>>>>> buffer pool
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: If the mysqld execution user is
>>>>>> authorized, page cleaner thread priority can be changed. See the man page
>>>>>> of setpriority().
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file ./ib_logfile101
>>>>>> size to 50331648 bytes
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file ./ib_logfile1
>>>>>> size to 50331648 bytes
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Renaming log file
>>>>>> ./ib_logfile101 to ./ib_logfile0
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: New log files created,
>>>>>> LSN=43323668550
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: 128 out of 128 rollback segments
>>>>>> are active.
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Creating shared tablespace for
>>>>>> temporary tables
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Setting file './ibtmp1' size to
>>>>>> 12 MB. Physically writing the file full; Please wait ...
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: File './ibtmp1' size is now 12
>>>>>> MB.
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Waiting for purge to start
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: 10.4.12 started; log sequence
>>>>>> number 43323669004; transaction id 22422468
>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Loading buffer pool(s) from
>>>>>> /var/lib/mysql/ib_buffer_pool
>>>>>> 2020-04-15 23:13:16 0 [Note] Plugin 'FEEDBACK' is disabled.
>>>>>> 2020-04-15 23:13:16 0 [Note] Server socket created on IP: '0.0.0.0'.
>>>>>> 2020-04-15 23:13:16 0 [Note] Reading of all Master_info entries
>>>>>> succeeded
>>>>>> 2020-04-15 23:13:16 0 [Note] Added new Master_info '' to hash table
>>>>>> 2020-04-15 23:13:16 0 [Note] /usr/sbin/mysqld: ready for connections.
>>>>>> Version: '10.4.12-MariaDB-1:10.4.12+maria~bionic' socket:
>>>>>> '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary
>>>>>> distribution
>>>>>> 2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed at
>>>>>> 200415 23:13:22
>>>>>>
>>>>>> ------------------------
>>>>>> The main difference between the two is that:
>>>>>> - the server socket is logged
>>>>>> - master_info is logged
>>>>>> - and it binds to the port/socket
>>>>>>
>>>>>> Before they *both* end with
>>>>>> 2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed at
>>>>>> 200415 23:13:22
>>>>>>
>>>>>> -- Other Notes --
>>>>>> - Config files are the same except for ips, both on newly built
>>>>>> servers as the cluster as they are managed through salt. I ran a diff -r
>>>>>> against prod vs staging and only saw
>>>>>>
>>>>>> There has to be some corruption in the mariabackup output, but still
>>>>>> reported OK. Since i can put a staging backup on a prod server, but i can
>>>>>> not put a prod backup on a staging server.
>>>>>>
>>>>>> If you've gotten this far, thank you. If you have any ideas on what
>>>>>> i could try, i'd be grateful. aria_chk and all other table checks return
>>>>>> correctly. I'm not sure what makes a server write out the master_info, so
>>>>>> possibly that is a pointer.
>>>>>>
>>>>>> I'm happy to answer any questions, i've tried way to many things to
>>>>>> post here as i didn't want to write until i had enough of dumb stuff
>>>>>> figured out. I could rebuild the prod cluster but am really looking to
>>>>>> understand this. I don't really want to build my own with debug info on
>>>>>> b/c why would they be different between the two vpcs.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>>>> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>>> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>>> More help : https://help.launchpad.net/ListHelp
>>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~maria-discuss
>>> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>
Follow ups
References