← Back to team overview

maria-discuss team mailing list archive

mysqld-wait-ready -> Re: start mariadb randomly fails or answers with wrong response to systemd

 

well, i should not have taken the last recent "mysqld-wait-ready"
from Fedora which has some stuff rely on mysqld_safe which is
completly useless below a supervisor like systemd

the main question is why is MySQL/MariaDB implemented in a way
that mysqld responds with status code *before* it is ready to
accept connections at all

such dirty hacks as below should not be need to provide a
clean boot in case of services depending on mysql connections

the current state is they are ordered after mysqld.service
but that does not help much without the hack below

[root@arrakis:~]$ cat /usr/libexec/mysqld-wait-ready
#!/bin/sh

# This script waits for mysqld to be ready to accept connections
# (which can be many seconds or even minutes after launch, if there's
# a lot of crash-recovery work to do).
# Running this as ExecStartPost is useful so that services declared as
# "After mysqld" won't be started until the database is really ready.

# Service file passes us the daemon's PID
daemon_pid="$1"

# extract value of a MySQL option from config files
# Usage: get_mysql_option SECTION VARNAME DEFAULT
# result is returned in $result
# We use my_print_defaults which prints all options from multiple files,
# with the more specific ones later; hence take the last match.
get_mysql_option(){
 result=`/usr/bin/my_print_defaults "$1" | sed -n "s/^--$2=//p" | tail -n 1`
 if [ -z "$result" ]; then
  # not found, use default
  result="$3"
 fi
}

# Defaults here had better match what mysqld_safe will default to
get_mysql_option mysqld datadir "/var/lib/mysql"
datadir="$result"
get_mysql_option mysqld socket "$datadir/mysql.sock"
socketfile="$result"

# Wait for the server to come up or for the mysqld process to disappear
ret=0
while /bin/true; do
 RESPONSE=`/usr/bin/mysqladmin --socket="$socketfile" --user=UNKNOWN_MYSQL_USER ping 2>&1`
 mret=$?
 if [ $mret -eq 0 ]; then
  break
 fi
 # exit codes 1, 11 (EXIT_CANNOT_CONNECT_TO_SERVICE) are expected,
 # anything else suggests a configuration error
 if [ $mret -ne 1 -a $mret -ne 11 ]; then
  ret=1
  break
 fi
 # "Access denied" also means the server is alive
 echo "$RESPONSE" | grep -q "Access denied for user" && break

 # Check process still exists
 if ! /bin/kill -0 $daemon_pid 2>/dev/null; then
  ret=1
  break
 fi
 sleep 1
done

exit $ret
Am 07.01.2014 20:46, schrieb Reindl Harald:
> i have for at least 5 years a backup server with two mysql-instances
> 
> * one replication instance listening on port 3307
> * one "live" service
> * a script once a hour stop both instances and rsync replication->live
> * after that start both intances again
> * looking at the mysqld logs i am pretty sure there is nothing wrong and
>   only the non.zero response causes systemd to restart the daemon one time
> 
> well, with MariaDB on Fedora 19 systemd cries about failing service
> at the same time there is nothing in the mysql-errorlog (both instances using the same)
> 
> the systemd-units are configured to restart the daemon again if it fails
> what replaces the mysqld_safe scripts and is done this way unchanged since
> summer 2011 with MySQL 5.5.x
> 
> has anybody an idea what causes the "mysqld.service" instance to respond with
> status 1 instead 0 causing systemd to consider the service as failed while
> there is nothing uncommon in the mysqld-logs?
> 
> the "Error reading relay log event: slave SQL thread was killed" as well as
> "Error reading packet from server" are not relevant in that context because
> they are from the not failing "replication.service" and normal due shutdown
> the slave (well, i dislike "error" in that context, but that's OT)
> 
> interesting is that i have 3 machine-pairs with the 100% identical logic
> two of them are failing this way nearly every hour (backup schedule 60 minutes)
> while the one with 5000 tables is running perfect all the time
> 
> any information i could provide additional to the logs and configs below?
> ___________________________________________________________________________
> 
> Jan  7 20:21:04 localhost backups: stop replication service
> Jan  7 20:21:04 localhost systemd[1]: Stopping MySQL Replication...
> Jan  7 20:21:04 localhost systemd[1]: Stopped MySQL Replication.
> Jan  7 20:21:04 localhost backups: stop live database
> Jan  7 20:21:04 localhost systemd[1]: Stopping MariaDB Database...
> Jan  7 20:21:04 localhost systemd[1]: Stopped MariaDB Database.
> Jan  7 20:21:04 localhost backups: backup replication data to '/data/mysql_data/'
> Jan  7 20:21:04 localhost backups: backup replication data finished
> Jan  7 20:21:04 localhost backups: start MySQL-Replication-Service
> Jan  7 20:21:04 localhost systemd[1]: Starting MySQL Replication...
> Jan  7 20:21:04 localhost systemd[1]: Started MySQL Replication.
> Jan  7 20:21:04 localhost backups: start MySQL-Live-Service
> Jan  7 20:21:04 localhost systemd[1]: Starting MariaDB Database...
> Jan  7 20:21:04 localhost systemd[1]: mysqld.service: control process exited, code=exited status=1
> Jan  7 20:21:04 localhost systemd[1]: Failed to start MariaDB Database.
> Jan  7 20:21:04 localhost systemd[1]: Unit mysqld.service entered failed state.
> Jan  7 20:21:04 localhost backups: database backup finished
> Jan  7 20:21:05 localhost systemd[1]: mysqld.service holdoff time over, scheduling restart.
> Jan  7 20:21:05 localhost systemd[1]: Stopping MariaDB Database...
> Jan  7 20:21:05 localhost systemd[1]: Starting MariaDB Database...
> Jan  7 20:21:06 localhost systemd[1]: Started MariaDB Database.
> ___________________________________________________________________________
> 
> 140107 20:21:04 [Note] /usr/libexec/mysqld: Normal shutdown
> 140107 20:21:04 [Note] Event Scheduler: Purging the queue. 0 events
> 140107 20:21:04 [Note] Error reading relay log event: slave SQL thread was killed
> 140107 20:21:04 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (
> server_errno=2013)
> 140107 20:21:04 [Note] Slave I/O thread killed while reading event
> 140107 20:21:04 [Note] Slave I/O thread exiting, read up to log 'bin.000002', position 245
> 140107 20:21:04 [Note] /usr/libexec/mysqld: Shutdown complete
> 140107 20:21:04 [Note] /usr/libexec/mysqld: Normal shutdown
> 140107 20:21:04 [Note] Event Scheduler: Purging the queue. 0 events
> 140107 20:21:04 [Note] /usr/libexec/mysqld: Shutdown complete
> 140107 20:21:04 [Note] Plugin 'InnoDB' is disabled.
> 140107 20:21:04 [Note] Plugin 'FEDERATED' is disabled.
> 140107 20:21:04 [Note] Plugin 'FEEDBACK' is disabled.
> 140107 20:21:04 [Note] Plugin 'InnoDB' is disabled.
> 140107 20:21:04 [Note] Plugin 'FEDERATED' is disabled.
> 140107 20:21:04 [Note] Plugin 'FEEDBACK' is disabled.
> 140107 20:21:04 [Note] Event Scheduler: Loaded 0 events
> 140107 20:21:04 [Note] /usr/libexec/mysqld: ready for connections.
> Version: '5.5.34-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 0  thelounge
> 140107 20:21:04 [Note] Event Scheduler: Loaded 0 events
> 140107 20:21:04 [Note] /usr/libexec/mysqld: ready for connections.
> Version: '5.5.34-MariaDB'  socket: '/var/lib/mysql/mysql_replication.sock'  port: 0  thelounge
> 140107 20:21:04 [Note] Slave SQL thread initialized, starting replication in log 'bin.000002' at position 245,
> relay log './mysql-relay-bin.000016' position: 523
> 140107 20:21:04 [Note] /usr/libexec/mysqld: Normal shutdown
> 140107 20:21:04 [Note] Event Scheduler: Purging the queue. 0 events
> 140107 20:21:04 [Note] /usr/libexec/mysqld: Shutdown complete
> 140107 20:21:04 [Note] Slave I/O thread: connected to master 'replication@masterserver:3306',replication started in
> log 'bin.000002' at position 245
> 140107 20:21:05 [Note] Plugin 'InnoDB' is disabled.
> 140107 20:21:05 [Note] Plugin 'FEDERATED' is disabled.
> 140107 20:21:05 [Note] Plugin 'FEEDBACK' is disabled.
> 140107 20:21:05 [Note] Event Scheduler: Loaded 0 events
> 140107 20:21:05 [Note] /usr/libexec/mysqld: ready for connections.
> Version: '5.5.34-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 0  thelounge
> ___________________________________________________________________________
> 
> /usr/lib/systemd/system/mysqld.service
> [Unit]
> Description=MariaDB Database
> Before=postfix.service dovecot.service dbmail-imapd.service dbmail-lmtpd.service dbmail-pop3d.service
> dbmail-postfix-policyd.service
> 
> [Service]
> Type=simple
> PIDFile=/var/run/mysqld/mysqld.pid
> ExecStart=/usr/libexec/mysqld --defaults-file=/etc/my.cnf --pid-file=/var/run/mysqld/mysqld.pid
> --socket=/var/lib/mysql/mysql.sock --open-files-limit=750000 --basedir=/usr --user=mysql
> ExecStartPost=/usr/libexec/mysqld-wait-ready $MAINPID
> Restart=always
> RestartSec=1
> TimeoutSec=300
> LimitNOFILE=infinity
> LimitMEMLOCK=infinity
> OOMScoreAdjust=-1000
> SysVStartPriority=64
> PrivateTmp=true
> CapabilityBoundingSet=~CAP_SYS_PTRACE
> ReadOnlyDirectories=/etc
> ReadOnlyDirectories=/usr
> InaccessibleDirectories=/boot
> InaccessibleDirectories=/home
> InaccessibleDirectories=/root
> Restart=always
> InaccessibleDirectories=/usr/local/scripts
> InaccessibleDirectories=/var/lib/rpm
> InaccessibleDirectories=/var/lib/yum
> InaccessibleDirectories=/var/spool
> 
> [Install]
> WantedBy=multi-user.target
> ___________________________________________________________________________
> 
> /usr/lib/systemd/system/replication.service
> [Unit]
> Description=MySQL Replication
> 
> [Service]
> Type=simple
> PIDFile=/var/run/mysqld/mysqld_replication.pid
> ExecStart=/usr/libexec/mysqld --defaults-file=/etc/my-replication.cnf
> --pid-file=/var/run/mysqld/mysqld_replication.pid --socket=/var/lib/mysql/mysql_replication.sock
> --open-files-limit=30000 --basedir=/usr --user=mysql
> Restart=always
> RestartSec=1
> TimeoutSec=300
> LimitNOFILE=infinity
> LimitMEMLOCK=infinity
> OOMScoreAdjust=-1000
> SysVStartPriority=64
> PrivateTmp=true
> CapabilityBoundingSet=~CAP_SYS_PTRACE
> ReadOnlyDirectories=/etc
> ReadOnlyDirectories=/usr
> InaccessibleDirectories=/boot
> InaccessibleDirectories=/home
> InaccessibleDirectories=/root
> InaccessibleDirectories=/usr/local/scripts
> InaccessibleDirectories=/var/lib/rpm
> InaccessibleDirectories=/var/lib/yum
> InaccessibleDirectories=/var/spool
> 
> [Install]
> WantedBy=multi-user.target

Attachment: signature.asc
Description: OpenPGP digital signature


Follow ups

References