← Back to team overview

maria-discuss team mailing list archive

Re: MySQL crash after 'partial' version upgrade

 

William Edwards schreef op 2021-07-23 19:16:
Hello,

The weather probably has an effect on me, because I'm not seeing the
cause for the issue below right away. Hopefully someone can use the
cluebat. Also, I apologize if we're not supposed to paste snippets of
this size in email, but I couldn't find a mailing list policy on this.

Some additional information after further debugging:

- The upgrade started at 16:06:18, MariaDB shut down gracefully at 16:06:23, but - while the upgrade was still in progress - started again exactly 5 seconds later at 16:06:28. This makes me suspect a watchdog started MariaDB at the wrong time. However, MariaDB shut down gracefully, so systemd's 'Restart=on-abort' shouldn't have done anything. - The server starts up fine with a new datadir created by 'mysql_install_db'. This confirms the suspicion that the data was corrupted because of *something* that happened during the upgrade.


I was using this repo:

`deb [arch=amd64,i386,ppc64el]
http://mariadb.mirror.pcextreme.nl/repo/10.3/debian stretch main`

This repo was no longer being updated, so I replaced it with this repo:

`deb [arch=amd64,arm64,ppc64el]
http://ams2.mirrors.digitalocean.com/mariadb/repo/10.3/debian buster
main`

That made these updates available:

```
galera-3/unknown 25.3.33-buster amd64 [upgradable from: 25.3.32-stretch]
libmariadb-dev-compat/unknown 1:10.3.30+maria~buster amd64 [upgradable
from: 1:10.3.28+maria~stretch]
libmariadb-dev/unknown 1:10.3.30+maria~buster amd64 [upgradable from:
1:10.3.28+maria~stretch]
libmariadb3/unknown 1:10.3.30+maria~buster amd64 [upgradable from:
1:10.3.28+maria~stretch]
mariadb-client-10.3/unknown 1:10.3.30+maria~buster amd64 [upgradable
from: 1:10.3.28+maria~stretch]
mariadb-client-core-10.3/unknown 1:10.3.30+maria~buster amd64
[upgradable from: 1:10.3.28+maria~stretch]
mariadb-common/unknown,unknown,unknown 1:10.3.30+maria~buster all
[upgradable from: 1:10.3.28+maria~stretch]
mariadb-server-10.3/unknown 1:10.3.30+maria~buster amd64 [upgradable
from: 1:10.3.28+maria~stretch]
mariadb-server-core-10.3/unknown 1:10.3.30+maria~buster amd64
[upgradable from: 1:10.3.28+maria~stretch]
mysql-common/unknown,unknown,unknown 1:10.3.30+maria~buster all
[upgradable from: 1:10.3.28+maria~stretch]
```

Ansible then wrongly attempted installing the meta package
'mariadb-server', which was not installed yet ('mariadb-server-10.3'
was installed directly). This meta package depends on
'mariadb-server-10.3 (>= 1:10.3.30+maria~buster)' (we were running
10.3.28, and an update was now available), so apt did this:

```
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required: gir1.2-packagekitglib-1.0 libappstream4 libglib2.0-bin libgstreamer1.0-0
  libpackagekit-glib2-18 libstemmer0d packagekit packagekit-tools
  python3-distro-info python3-software-properties
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  mariadb-client-10.3 mariadb-client-core-10.3 mariadb-common
  mariadb-server-10.3 mariadb-server-core-10.3
Suggested packages:
  mailx mariadb-test netcat-openbsd tinyca
The following NEW packages will be installed:
  mariadb-server
The following packages will be upgraded:
  mariadb-client-10.3 mariadb-client-core-10.3 mariadb-common
  mariadb-server-10.3 mariadb-server-core-10.3
5 upgraded, 1 newly installed, 0 to remove and 32 not upgraded.
Need to get 12.0 MB of archives.
[...]
dpkg: error processing package mariadb-server-10.3 (--configure):
 installed mariadb-server-10.3 package post-installation script
subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of mariadb-server:
 mariadb-server depends on mariadb-server-10.3 (>=
1:10.3.30+maria~buster); however:
  Package mariadb-server-10.3 is not configured yet.

dpkg: error processing package mariadb-server (--configure):
 dependency problems - leaving unconfigured
Processing triggers for man-db (2.8.5-2) ...
Processing triggers for systemd (241-7~deb10u8) ...
Errors were encountered while processing:
 mariadb-server-10.3
 mariadb-server
```

From 16:06:44 (apt upgrade fail time), SST kept failing due to
https://jira.mariadb.org/browse/MDEV-26172 (most likely not related to
this issue), till 16:09:32, which is when I stopped MariaDB.

After stopping and starting MariaDB (probably loading new libraries),
I started seeing this:

```
210723 16:09:33 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.3.30-MariaDB-1:10.3.30+maria~buster-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=102
thread_count=4
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
355304 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f0d8c000c08
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f0d9effcce8 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x561a7200ddee]
/usr/sbin/mysqld(handle_fatal_signal+0x54d)[0x561a71b2087d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f0df182b730]
/usr/sbin/mysqld(+0x9a6f22)[0x561a71d10f22]
/usr/sbin/mysqld(+0xa80632)[0x561a71dea632]
/usr/sbin/mysqld(+0xa03852)[0x561a71d6d852]
/usr/sbin/mysqld(+0xa039f5)[0x561a71d6d9f5]
/usr/sbin/mysqld(+0x9fe0f1)[0x561a71d680f1]
/usr/sbin/mysqld(+0x9fe1e2)[0x561a71d681e2]
/usr/sbin/mysqld(+0xa00aba)[0x561a71d6aaba]
/usr/sbin/mysqld(+0xa01129)[0x561a71d6b129]
/usr/sbin/mysqld(+0x9c2d88)[0x561a71d2cd88]
/usr/sbin/mysqld(+0xa2777c)[0x561a71d9177c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f0df1820fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f0df174f4cf]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 3
Status: NOT_KILLED

Optimizer switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on

The manual page at
https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
contains
information that should help you find out what is causing the crash.

We think the query pointer is invalid, but we will try to print it anyway.
Query:

Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 15657 15657 processes Max open files 32133 32133 files Max locked memory 67108864 67108864 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 15657 15657 signals Max msgqueue size 819200 819200 bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core
```

Thanks to systemd, this went on for 2 minutes, until this, which is
when I gave up trying to fix this and replaced the node in the Galera
cluster:

```
2021-07-23 16:11:07 0 [Note] Starting crash recovery...
2021-07-23 16:11:07 0 [Note] Crash recovery finished.
2021-07-23 16:11:07 6 [ERROR] mysqld: Table './mysql/user' is marked
as crashed and should be repaired
2021-07-23 16:11:07 6 [Warning] Checking table:   './mysql/user'
2021-07-23 16:11:07 6 [ERROR] mysql.user: 1 client is using or hasn't
closed the table properly
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f0f17430fa3]
2021-07-23 16:11:08 0 [ERROR] InnoDB: Page [page id: space=11326, page
number=3] log sequence number 30633696699 is in the future! Current
system log sequence number 30424127374.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Your database may be corrupt or
you may have copied the InnoDB tablespace but not the InnoDB log
files. Please refer to
https://mariadb.com/kb/en/library/innodb-recovery-modes/ for
information about forcing recovery.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Page [page id: space=11330, page
number=3] log sequence number 30634051832 is in the future! Current
system log sequence number 30424127374.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Your database may be corrupt or
you may have copied the InnoDB tablespace but not the InnoDB log
files. Please refer to
https://mariadb.com/kb/en/library/innodb-recovery-modes/ for
information about forcing recovery.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Page [page id: space=11332, page
number=3] log sequence number 30634054073 is in the future! Current
system log sequence number 30424127374.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Your database may be corrupt or
you may have copied the InnoDB tablespace but not the InnoDB log
files. Please refer to
https://mariadb.com/kb/en/library/innodb-recovery-modes/ for
information about forcing recovery.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Page [page id: space=11333, page
number=3] log sequence number 30633696016 is in the future! Current
system log sequence number 30424127374.
2021-07-23 16:11:08 0 [ERROR] InnoDB: Your database may be corrupt or
you may have copied the InnoDB tablespace but not the InnoDB log
files. Please refer to
https://mariadb.com/kb/en/library/innodb-recovery-modes/ for
information about forcing recovery.
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f0f1735f4cf]
```

These are the packages that apt installed/upgraded:

```
The following additional packages will be installed:
  mariadb-client-10.3 mariadb-client-core-10.3 mariadb-common
  mariadb-server-10.3 mariadb-server-core-10.3
Suggested packages:
  mailx mariadb-test netcat-openbsd tinyca
The following NEW packages will be installed:
  mariadb-server
The following packages will be upgraded:
  mariadb-client-10.3 mariadb-client-core-10.3 mariadb-common
  mariadb-server-10.3 mariadb-server-core-10.3
```

That means that these packages, that were available for upgrade, were
not upgraded:

- mysql-common
- libmariadb3
- libmariadb-dev
- libmariadb-dev-compat
- galera-3

However, none of the upgraded packages depend on newer versions of the
packages above, so I would have expected this - even though installing
the meta package was a mistake - to just work (assuming the problem
was an e.g. outdated library).

I have two other machines in the same situation, where I simply did
'apt upgrade' after replacing the repo, and all seems well.

In conclusion: replacing a repo that provides an update from 10.3.28
to 10.3.30, then installing the 'mariadb-server' meta package that
does not install all available updates (like 'libmariadb3'), caused
MySQL to crash (badly)... While installing all available updates at
the same time ('apt upgrade') does not cause this issue on another
machine. I don't necessarily want to understand how to avoid this
specific situation, because I should've done a full upgrade in any
case, but I am wondering what I am missing that caused this issue.

--
With kind regards,

William Edwards



Follow ups

References