← Back to team overview

maria-discuss team mailing list archive

MariaDB init.d Script Probable Bug in wait_for_ready()

 

Hi,

Context:
-----------

Version:             MariaDB 10.0
Repo:                 http://yum.mariadb.org/10.0/centos7-amd64
OS:                    CentOS Linux release 7.2.1511
Kernel:               3.10.0-229.7.2.el7.x86_64
Cloud Provider:  Windows Azure

I run a three-node MariaDB-Galera cluster on production for my company.
With the custom config file at:

*/etc/my.cnf.d/server.cnf*


My datadisk is mounted over a lvm2 partition over two physical volumes,
mounted on the path.

*/datadrive/galera/mysql*


The logical volume only takes up 50% of the volume group as I leave the
empty space for snapshot operations.

This is reflected in my configuration as:

*[mysqld]*
*datadir = /datadrive/galera/mysql *
*socket = /datadrive/galera/mysql/mysql.sock *


Issue:
--------
Coming to the issue at hand;

When starting the primary component  through:

*$ sudo service mysql bootstrap*

or,

*$ sudo service mysqld start --wsrep-new-cluster*


( Essentially same, as bootstrap differs to *start --wsrep_new_cluster *)

The bootstrap sequence correctly calls mysqldsafe, as below

*/usr/bin/mysqld_safe --datadir=/datadrive/galera/mysql
--pid-file=/datadrive/galera/mysql/ciq-test-db01.pid --wsrep-new-cluster*


However,
it then gets stuck in the *wait_for_ready() function of /etc/init.d/mysql*

The sequence tries to ping mysqladmin through:

* if $bindir/mysqladmin ping >/dev/null 2>&1; then*
   *log_success_msg*
*   return 0*
*elif kill -0 $! 2>/dev/null ; then*
*  :  # mysqld_safe is still running*
*else*

*  # mysqld_safe is no longer running, abort the wait loop*
*  break*


It gets stuck here as the test:

* if $bindir/mysqladmin ping >/dev/null 2>&1; then*


Always fails and the counter keeps increasing till 900 and Error's out with
the

*log_failure_msg*

*return 1*


Test:

I tested that line from the console as:

*$mysqladmin ping >dev/null 2>&1*


and, was thrown the following error:

*mysqladmin: connect to server at 'localhost' failed*
*error: 'Can't connect to local MySQL server through socket
'/var/lib/mysql/mysql.sock' (2 "**No such file or directory")'*
*Check that mysqld is running and that the socket:
'/var/lib/mysql/mysql.sock' exists!*

Obviously, '*/var/lib/mysql' *does not exist in my setup, and the socket
file was at *'/datadrive/galera/mysql/mysql.sock'*

But, my socket can be passed explicitly to *mysqladmin *by doing,

*$mysqladmin --socket=**/datadrive/galera/mysql/mysql.sock ping*


This actually worked as expected and I got the desired output,

*mysqld is alive*


Solution:
-----------

What I surmised through my novice abilities was that the init script was
not passing the variables it read from */etc/my.cnf.d/server.cnf to
mysqladmin*

So, for now I have done a hackjob by altering the init script function
wait_for_ready() with the test:

*if $bindir/mysqladmin --socket=/datadrive/galera/mysql/mysql.sock ping
>/dev/null 2>&1; then*
*  log_success_msg*
*  return 0*


And voila,

*$ sudo service mysql start*
*$ sudo service mysql start --wsrep-new-cluster*
*$ sudo service mysql bootstrap*


All above works. It is evident that I should be passing all the variables
from */etc/my.cnf.d/server.cnf *that *mysqladmin *needs to set it's
environement correctly.
However I think that this could be / should be done on the pre-distributed
init script itself rather than a user-side hack.

While I have tried to be thorough in my inspection of the issue, I may have
missed either something very basic or inherently complex that's currently
is inherent to the process of initializing the mysql service. Please
redirect me to a corrected course if that is the case.

P.S. First time post in the list, I may have stated things unacceptably
P.P.S. Merry Christmas to you all.

Thanks and regards,

Joy Bhattacherjee
Mob: +91-9011235028

Follow ups