maria-discuss team mailing list archive

Thread
Date
Re: backup from any cluster member can't be restarted and restored

To: MariaDB discuss <maria-discuss@xxxxxxxxxxxxxxxxxxx>
From: Jeff Dyke <jeff.dyke@xxxxxxxxx>
Date: Tue, 26 May 2020 18:44:22 -0400
In-reply-to: <CAHmnZdboxnEV09QDF-nw8FHp+Bzo+3s5r-h+-E4R8KaKExuQug@mail.gmail.com>
I want to resurrect this.  b/c i have found what i see to be the broken bit
of code/configuration.  If i don't include the following the commented out
portion everything starts up.
-- Starts --
/etc/mysql/conf.conf.d/100_sqlerr.log
[mysqld]
  #plugin_load_add = sql_errlog
  sql_error_log=ON
  sql_error_log_rotate=1
  sql_error_log_size_limit=100000000

so if i remove the comment for "plugin_log_add" to run the server never
starts.  I will produce a reproduce case, but can someone explain why this
would not start if i added a plugin that was already added.

This journey sucked, but i learned a lot, even though i've been  using
mysql since 3.X  Is there different syntax for 10.4.  Should i just add it
to start up args? and then add the variables after it starts.  This worked
in 10.3.

Thanks all!


On Mon, Apr 27, 2020 at 3:32 PM Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:

> That second note is horribly misleading (note the bold):
> -- orig --
> - I was able to take a full physical backup that runs every morning from
> the VPC 1 (prod cluster) directly from S3 and restore it to the server
> where i can not restore backups of any of the prod nodes.  After this
> successfully started, i ran mysqlcheck --all-databases (online) againh mst
> this newly restored database and there was no corruption.
> -- should be --
> - I was able to take the full physical backup *from a slave whose master
> is a prod cluster node,* which runs every morning, from S3 to the server
> where i can not restore backups of any of the prod nodes.  After this
> successfully started, i ran mysqlcheck --all-databases (online) againh mst
> this newly restored database and there was no corruption.
>
> Sorry and Thanks
>
> On Mon, Apr 27, 2020 at 2:38 PM Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>
>> Hello again peoples.  I'm still at it, but have taken some time for
>> higher priority projects and now have some follow up data, not sure if i
>> have any questions, we'll see.
>>
>> in the staging VPC, referred above/below as VPC 2.  I can restore any
>> combinations of tables right down to only selecting mysql and
>> performance_schema.  In the prod VPC (VPC 1).  I can not install any
>> combination of databases.  So like was thought before, i may have
>> corruption in one of the tables, but....
>>
>> - i have checked all tables using innochecksum (offline), aria_chk
>> (offline) and there are no indications of corruption
>> - I was able to take a full physical backup that runs every morning from
>> the VPC 1 (prod cluster) directly from S3 and restore it to the server
>> where i can not restore backups of any of the prod nodes.  After this
>> successfully started, i ran mysqlcheck --all-databases (online) againh mst
>> this newly restored database and there was no corruption.
>> - if i take the production node, that I'm using for backup testing, out
>> of the cluster and then run mysqlcheck --all-databases and put it back on
>> the cluster, no corrupt tables are reported.
>>
>> I would prefer not to build an entire new cluster, but realize i could
>> have done so by now.  b/c i would really like to know why and with the
>> backups and snapshots that i can use the slave for along with 3 cluster
>> nodes, my data is safe and being written to daily.
>>
>> All of these databases where upgraded from 10.3 to 10.4 (in the last two
>> months) according to the instructions on mariadb.com.  I ran staging
>> first, let it hang a couple weeks and then upgraded production and both are
>> at the same exact release.   Ver 10.4.12-MariaDB-1:10.4.12+maria~bionic-log
>> for debian-linux-gnu on x86_64 (mariadb.org binary distribution)
>>
>> I've considered compiling a debug enabled version of mariadb, but that
>> seems it would only be useful if it were on the galera nodes, and i'm not
>> too keen on doing that.
>>
>> Any and all thoughts are welcome, i didn't want to repeat any information
>> so for new readers you'll have to read from the start of the thread and i
>> do indeed appreciate it.
>>
>> Jeff
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Apr 18, 2020 at 1:22 PM Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>>
>>> Hi Karl, someone else just saw that as well.  it was indeed a type in
>>> the email.  To answer your questions; the plugins are the same
>>> sql_error_log and socket_plugin (which is no longer needed in 10.4 and was
>>> removed after the upgrade).  Also since i can restore a new node from a
>>> staging backup, in either VPC.
>>>
>>> Databases are a good question, no i have not.  Perhaps i should try a
>>> complete set of combinations, which may be a better use of time than
>>> brining the cluster up to 5 and then seeing if i can take a physical backup
>>> of one of the new nodes.
>>>
>>> On Sat, Apr 18, 2020 at 1:11 PM Karl Erik Levik <karl.levik@xxxxxxxxx>
>>> wrote:
>>>
>>>> Well, what I mean is, you're missing the "q" in "mysql", so I was
>>>> thinking perhaps that command doesn't actually do what you think it does?
>>>>
>>>> If that's not the issue, then I guess you have to look again at your
>>>> prod nodes and figure out what, if anything, is special/unusual about them.
>>>> Do you have any special plugins enabled? Any special storage engines used
>>>> that you don't have on staging? (Wouldn't think so since only InnoDB is
>>>> properly supported by Galera ...) Do you have any crud in your datadir that
>>>> might create problems for mariabackup?
>>>>
>>>> Have you tried doing a partial backup/restore instead, i.e. only some
>>>> of the databases?
>>>>
>>>> Cheers,
>>>> Karl
>>>>
>>>> On Sat, 18 Apr 2020 at 17:25, Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>>>>
>>>>> Thanks Karl, that is just there so everything under mysql is deleted
>>>>> and not the directory itself as mariabackup needs the datadir clean.  And
>>>>> i'm using the same "script" on both VPCs.
>>>>>
>>>>> Appreciate the read!
>>>>>
>>>>> On Sat, Apr 18, 2020 at 6:01 AM Karl Erik Levik <karl.levik@xxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>> Hi Jeff,
>>>>>>
>>>>>> Probably not helpful, but I dunno if this is a typo just in your
>>>>>> email or also in the script?
>>>>>>
>>>>>> > sudo rm -rf /var/lib/mysl/*
>>>>>>
>>>>>> Cheers,
>>>>>> Karl
>>>>>>
>>>>>> On Thu, 16 Apr 2020 at 00:47, Jeff Dyke <jeff.dyke@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> This has biting me for a bit, and i have tried a litany of things.
>>>>>>> Here is the issue:
>>>>>>> VPC 1: (prod)
>>>>>>> Mariadb 10.4.12 - galera in a 3 node cluster, cluster functioning
>>>>>>> perfectly
>>>>>>>
>>>>>>> VPC 2: (staging)
>>>>>>> Mariadb 10.4.12 galera in a 3 node cluster also running smoothly.
>>>>>>>
>>>>>>> -- backup script (clipped for only the essentials) --
>>>>>>> set -e
>>>>>>> DIR="/home/jeff/restore"
>>>>>>> rm -rf $DIR
>>>>>>> mkdir $DIR
>>>>>>> AUTH='--defaults-extra-file=/root/db.cnf -u root'
>>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = ON;"
>>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = OFF";
>>>>>>> sudo mariabackup ${AUTH} --backup --binlog-info=ON
>>>>>>> --target-dir="${DIR}" 2>&1 | grep -vE "Copying|done|log scanned"
>>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_on = ON;"
>>>>>>> sudo mysql ${AUTH} -e "SET GLOBAL wsrep_desync = OFF";
>>>>>>>
>>>>>>> -- end script --
>>>>>>>
>>>>>>> -- restore process --
>>>>>>> sudo service mariadb stop (or kill -9 PID only when the server won't
>>>>>>> start)
>>>>>>> sudo rm -rf /var/lib/mysl/*
>>>>>>> sudo mariabackup --prepare --target-dir=restore
>>>>>>> sudo mariabackup --copy-back --target-dir=restore
>>>>>>> sudo chown -R mysql:mysql /var/lib/mysql/*
>>>>>>> sudo service mariadb start
>>>>>>> -- end restore --
>>>>>>>
>>>>>>> I can run this on any of the script and backup process on any of the
>>>>>>> *staging* nodes and restore them to a freshly built server within
>>>>>>> that VPC.  I can also take the backup from *staging* and load in
>>>>>>> into a newly build machine on the *prod* VPC.
>>>>>>>
>>>>>>> What i can not do is run the script on any of the prod members and
>>>>>>> restore it to a newly built server, regardless of location.  (really the
>>>>>>> locations don't matter at all)
>>>>>>>
>>>>>>> The following is the log from a failing restore from a proddb backup
>>>>>>> and restore:
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Using Linux native AIO
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Mutexes and rw_locks use GCC
>>>>>>> atomic builtins
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Uses event mutexes
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Compressed tables use zlib
>>>>>>> 1.2.11
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Number of pools: 1
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Using SSE2 crc32 instructions
>>>>>>> 2020-04-15 22:11:10 0 [Note] mysqld: O_TMPFILE is not supported on
>>>>>>> /tmp (disabling future attempts)
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Initializing buffer pool, total
>>>>>>> size = 2G, instances = 8, chunk size = 128M
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Completed initialization of
>>>>>>> buffer pool
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: If the mysqld execution user is
>>>>>>> authorized, page cleaner thread priority can be changed. See the man page
>>>>>>> of setpriority().
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file
>>>>>>> ./ib_logfile101 size to 50331648 bytes
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Setting log file ./ib_logfile1
>>>>>>> size to 50331648 bytes
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Renaming log file
>>>>>>> ./ib_logfile101 to ./ib_logfile0
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: New log files created,
>>>>>>> LSN=42006460184
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: 128 out of 128 rollback
>>>>>>> segments are active.
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Creating shared tablespace for
>>>>>>> temporary tables
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Setting file './ibtmp1' size to
>>>>>>> 12 MB. Physically writing the file full; Please wait ...
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: File './ibtmp1' size is now 12
>>>>>>> MB.
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Waiting for purge to start
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: 10.4.12 started; log sequence
>>>>>>> number 42006460428; transaction id 23858673
>>>>>>> 2020-04-15 22:11:10 0 [Note] InnoDB: Loading buffer pool(s) from
>>>>>>> /var/lib/mysql/ib_buffer_pool
>>>>>>> 2020-04-15 22:11:10 0 [Note] Plugin 'FEEDBACK' is disabled.
>>>>>>> 2020-04-15 22:11:11 0 [Note] InnoDB: Buffer pool(s) load completed
>>>>>>> at 200415 22:11:11
>>>>>>> -- hangs here for about an hour before timing out --
>>>>>>>
>>>>>>>
>>>>>>> Logfile from a successful start
>>>>>>>
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Using Linux native AIO
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Mutexes and rw_locks use GCC
>>>>>>> atomic builtins
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Uses event mutexes
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Compressed tables use zlib
>>>>>>> 1.2.11
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Number of pools: 1
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Using SSE2 crc32 instructions
>>>>>>> 2020-04-15 23:13:15 0 [Note] mysqld: O_TMPFILE is not supported on
>>>>>>> /tmp (disabling future attempts)
>>>>>>> 2020-04-15 23:13:15 0 [Note] InnoDB: Initializing buffer pool, total
>>>>>>> size = 2G, instances = 8, chunk size = 128M
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Completed initialization of
>>>>>>> buffer pool
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: If the mysqld execution user is
>>>>>>> authorized, page cleaner thread priority can be changed. See the man page
>>>>>>> of setpriority().
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file
>>>>>>> ./ib_logfile101 size to 50331648 bytes
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Setting log file ./ib_logfile1
>>>>>>> size to 50331648 bytes
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Renaming log file
>>>>>>> ./ib_logfile101 to ./ib_logfile0
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: New log files created,
>>>>>>> LSN=43323668550
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: 128 out of 128 rollback
>>>>>>> segments are active.
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Creating shared tablespace for
>>>>>>> temporary tables
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Setting file './ibtmp1' size to
>>>>>>> 12 MB. Physically writing the file full; Please wait ...
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: File './ibtmp1' size is now 12
>>>>>>> MB.
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Waiting for purge to start
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: 10.4.12 started; log sequence
>>>>>>> number 43323669004; transaction id 22422468
>>>>>>> 2020-04-15 23:13:16 0 [Note] InnoDB: Loading buffer pool(s) from
>>>>>>> /var/lib/mysql/ib_buffer_pool
>>>>>>> 2020-04-15 23:13:16 0 [Note] Plugin 'FEEDBACK' is disabled.
>>>>>>> 2020-04-15 23:13:16 0 [Note] Server socket created on IP: '0.0.0.0'.
>>>>>>> 2020-04-15 23:13:16 0 [Note] Reading of all Master_info entries
>>>>>>> succeeded
>>>>>>> 2020-04-15 23:13:16 0 [Note] Added new Master_info '' to hash table
>>>>>>> 2020-04-15 23:13:16 0 [Note] /usr/sbin/mysqld: ready for connections.
>>>>>>> Version: '10.4.12-MariaDB-1:10.4.12+maria~bionic'  socket:
>>>>>>> '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary
>>>>>>> distribution
>>>>>>> 2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed
>>>>>>> at 200415 23:13:22
>>>>>>>
>>>>>>> ------------------------
>>>>>>> The main difference between the two is that:
>>>>>>> - the server socket is logged
>>>>>>> - master_info is logged
>>>>>>> - and it binds to the port/socket
>>>>>>>
>>>>>>> Before they *both* end with
>>>>>>> 2020-04-15 23:13:22 0 [Note] InnoDB: Buffer pool(s) load completed
>>>>>>> at 200415 23:13:22
>>>>>>>
>>>>>>> -- Other Notes --
>>>>>>> - Config files are the same except for ips, both on newly built
>>>>>>> servers as the cluster as they are managed through salt.  I ran a diff -r
>>>>>>> against prod vs staging and only saw
>>>>>>>
>>>>>>> There has to be some corruption in the mariabackup output, but still
>>>>>>> reported OK.  Since i can put a staging backup on a prod server, but i can
>>>>>>> not put a prod backup on a staging server.
>>>>>>>
>>>>>>> If you've gotten this far, thank you.  If you have any ideas on what
>>>>>>> i could try, i'd be grateful.  aria_chk and all other table checks return
>>>>>>> correctly.  I'm not sure what makes a server write out the master_info, so
>>>>>>> possibly that is a pointer.
>>>>>>>
>>>>>>> I'm happy to answer any questions, i've tried way to many things to
>>>>>>> post here as i didn't want to write until i had enough of dumb stuff
>>>>>>> figured out.  I could rebuild the prod cluster but am really looking to
>>>>>>> understand this.  I don't really want to build my own with debug info on
>>>>>>> b/c why would they be different between the two vpcs.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>
Follow ups

Re: backup from any cluster member can't be restarted and restored
From: Jeff Dyke, 2020-05-26
References

backup from any cluster member can't be restarted and restored
From: Jeff Dyke, 2020-04-15
Re: backup from any cluster member can't be restarted and restored
From: Karl Erik Levik, 2020-04-18
Re: backup from any cluster member can't be restarted and restored
From: Jeff Dyke, 2020-04-18
Re: backup from any cluster member can't be restarted and restored
From: Karl Erik Levik, 2020-04-18
Re: backup from any cluster member can't be restarted and restored
From: Jeff Dyke, 2020-04-18
Re: backup from any cluster member can't be restarted and restored
From: Jeff Dyke, 2020-04-27
Re: backup from any cluster member can't be restarted and restored
From: Jeff Dyke, 2020-04-27