← Back to team overview

maria-developers team mailing list archive

Re: d8e3cfacebc: MDEV-14616: WSREP has not yet prepared node for application use error

 

Hi, Jan!

On Feb 13, jan wrote:
> revision-id: d8e3cfacebc47849532ef3e7bbf18f9da53bab88 (mariadb-10.1.31-20-gd8e3cfacebc)
> parent(s): 474f3df092cc73342b2da03aa03ba5c66930f596
> author: Jan Lindström
> committer: Jan Lindström
> timestamp: 2018-02-13 10:04:48 +0200
> message:
> 
> MDEV-14616: WSREP has not yet prepared node for application use error
> 
> Contains following Galera fixes:
> 
> MW-405 Make sure wsrep is ready in wait_until_connected_again.inc

What is "MW-405", can I see the original bug report somewhere?

>        wait_until_connected_again issues 'SHOW STATUS' query repeatedly
>        until mysqld replies without errors. However, SHOW STATUS is
>        treated specially by wsrep in that it is allowed to proceed
>        even if wsrep is not yet in ready state. As a consequence,
>        after returning from wait_until_connected_again, wsrep may
>        not be ready yet and subsequent queries may fail with error
>        "1047 WSREP has not yet prepared node for application use".
>        To avoid those errors, the patch includes
>        wait_wsrep_ready.inc at the end of the wait_until_connected_again.inc

I don't quite understand the issue. Old code did wait_wsrep_ready.inc
after wait_until_connected_again.inc. Now wait_wsrep_ready.inc is
included in wait_until_connected_again.inc.

This seems to imply that the bug was "there were some cases where
wait_wsrep_ready.inc was not included after
wait_until_connected_again.inc".

There are, of course, two ways to fix such a bug. Either include
wait_wsrep_ready.inc after wait_until_connected_again.inc, or include
wait_wsrep_ready.inc in wait_until_connected_again.inc.

The latter fix implies that there are no possible cases when one would
want to wait for a server to be connected, but before wsrep is ready.
Are there? For the purpose of testing, that could be useful, could it
not?
 
>        kill_galera.inc can no longer rely on wait_until_connected_again.inc.
>        This is because wait_until_connected_again now tries to make sure
>        that the server it is connected eventually transition to ready state.
>        Whereas some tests may need to kill galera while the server is in a
>        non-primary view.

Which means that there are cases when a test needs the server connected,
but before wsrep is ready, right?

> MW-408 Fix 'WSREP error while trying to determine node state'
> 
>        mysql-test-run.pl sporadically reports 'WSREP error while
>        trying to determine node state' right after starting servers
>        for test execution. This happens because we try to execute a
>        SELECT statement that queries the current value of status variable
>        wsrep_ready. If this statement fails, the above message is reported.
>        The failure is due to fact that wsrep may return error
>        ER_LOCK_WAIT_TIMEOUT (on any SELECT statement) if it is not ready
>        and wsrep_sync_wait enabled for SELECTs. The fix is to disable
>        wsrep_sync_wait for the session that issues those SELECT statements.

1. Could you put this fix in a separate commit please?

2. Sorry, I don't understand. Where do "we try to execute a SELECT
statement that queries the current value of status variable wsrep_ready" ?

> ---
>  mysql-test/include/restart_mysqld.inc             |   3 -
>  mysql-test/include/start_mysqld.inc               |   3 -
>  mysql-test/include/wait_until_connected_again.inc |  11 ++-
>  mysql-test/mysql-test-run.pl                      | 104 ++++++++++++++++++++++
>  mysql-test/suite/galera/include/kill_galera.inc   |   1 -
>  5 files changed, 111 insertions(+), 11 deletions(-)
> 
> diff --git a/mysql-test/include/restart_mysqld.inc b/mysql-test/include/restart_mysqld.inc
> index a0447280ff5..940e081c431 100644
> --- a/mysql-test/include/restart_mysqld.inc
> +++ b/mysql-test/include/restart_mysqld.inc
> @@ -50,9 +50,6 @@ if (!$restart_parameters)
>  # Call script that will poll the server waiting for it to be back online again
>  --source include/wait_until_connected_again.inc
>  
> -# Wait for wsrep
> ---source include/wait_wsrep_ready.inc
> -
>  # Turn off reconnect again
>  --disable_reconnect
>  
> diff --git a/mysql-test/include/start_mysqld.inc b/mysql-test/include/start_mysqld.inc
> index 04dff714d49..e31f26aad8c 100644
> --- a/mysql-test/include/start_mysqld.inc
> +++ b/mysql-test/include/start_mysqld.inc
> @@ -16,9 +16,6 @@ if (!$restart_parameters)
>  # Call script that will poll the server waiting for it to be back online again
>  --source include/wait_until_connected_again.inc
>  
> -# Wait for wsrep
> ---source include/wait_wsrep_ready.inc
> -
>  # Turn off reconnect again
>  --disable_reconnect
>  
> diff --git a/mysql-test/include/wait_until_connected_again.inc b/mysql-test/include/wait_until_connected_again.inc
> index 6f64ef45440..4958d276ebd 100644
> --- a/mysql-test/include/wait_until_connected_again.inc
> +++ b/mysql-test/include/wait_until_connected_again.inc
> @@ -11,10 +11,7 @@ let $counter= 5000;
>  let $mysql_errno= 9999;
>  while ($mysql_errno)
>  {
> -  # Strangely enough, the server might return "Too many connections"
> -  # while being shutdown, thus 1040 is an "allowed" error
> -  # See BUG#36228
> -  --error 0,1040,1053,2002,2003,2005,2006,2013,1927
> +  --error 0,ER_SERVER_SHUTDOWN,ER_CONNECTION_KILLED,2002,2003,2006,2013
>    show status;
>  
>    dec $counter;
> @@ -26,3 +23,9 @@ while ($mysql_errno)
>  }
>  --enable_query_log
>  --enable_result_log
> +
> +# WSREP: SHOW STATUS queries are allowed even if wsrep
> +#        is not ready. Make sure wsrep is ready before
> +#        returning from this script
> +
> +--source include/wait_wsrep_ready.inc
> diff --git a/mysql-test/mysql-test-run.pl b/mysql-test/mysql-test-run.pl
> index eaec51b82b4..324f2e98ebd 100755
> --- a/mysql-test/mysql-test-run.pl
> +++ b/mysql-test/mysql-test-run.pl
> @@ -2883,6 +2883,49 @@ sub mysql_server_wait {
>                                        $warn_seconds);
>  }
>  
> +sub have_wsrep() {
> +  my $wsrep_on= $mysqld_variables{'wsrep-on'};
> +  return defined $wsrep_on
> +}
> +
> +
> +sub check_wsrep_support() {
> +  if (have_wsrep())
> +  {
> +    mtr_report(" - binaries built with wsrep patch");
> +
> +    # ADD scripts to $PATH to that wsrep_sst_* can be found
> +    my ($path) = grep { -f "$_/wsrep_sst_rsync"; } "$::bindir/scripts", $::path_client_bindir;
> +    mtr_error("No SST scripts") unless $path;
> +    $ENV{PATH}="$path:$ENV{PATH}";
> +
> +    # Check whether WSREP_PROVIDER environment variable is set.
> +    if (defined $ENV{'WSREP_PROVIDER'}) {
> +      if ((mtr_file_exists($ENV{'WSREP_PROVIDER'}) eq "")  &&
> +          ($ENV{'WSREP_PROVIDER'} ne "none")) {
> +        mtr_error("WSREP_PROVIDER env set to an invalid path");
> +      }
> +      # WSREP_PROVIDER is valid; set to a valid path or "none").
> +      mtr_verbose("WSREP_PROVIDER env set to $ENV{'WSREP_PROVIDER'}");
> +    } else {
> +      # WSREP_PROVIDER env not defined. Lets try to locate the wsrep provider
> +      # library.
> +      my $file_wsrep_provider=
> +        mtr_file_exists("/usr/lib/galera/libgalera_smm.so",
> +                        "/usr/lib64/galera/libgalera_smm.so");
> +
> +      if ($file_wsrep_provider ne "") {
> +        # wsrep provider library found !
> +        mtr_verbose("wsrep provider library found : $file_wsrep_provider");
> +        $ENV{'WSREP_PROVIDER'}= $file_wsrep_provider;
> +      } else {
> +        mtr_verbose("Could not find wsrep provider library, setting it to 'none'");
> +        $ENV{'WSREP_PROVIDER'}= "none";
> +      }
> +    }
> +  }
> +}
> +
>  sub create_config_file_for_extern {
>    my %opts=
>      (
> @@ -3341,6 +3384,62 @@ sub run_query {
>    return $res
>  }
>  
> +sub run_query_output {
> +  my ($mysqld, $query, $outfile)= @_;
> +
> +  my $args;
> +  mtr_init_args(\$args);
> +  mtr_add_arg($args, "--defaults-file=%s", $path_config_file);
> +  mtr_add_arg($args, "--defaults-group-suffix=%s", $mysqld->after('mysqld'));
> +
> +  mtr_add_arg($args, "--silent");
> +  mtr_add_arg($args, "--execute=%s", $query);
> +
> +  my $res= My::SafeProcess->run
> +    (
> +     name          => "run_query_output -> ".$mysqld->name(),
> +     path          => $exe_mysql,
> +     args          => \$args,
> +     output        => $outfile,
> +     error         => $outfile
> +    );
> +
> +  return $res
> +}
> +
> +sub wait_wsrep_ready($$) {
> +  my ($tinfo, $mysqld)= @_;
> +
> +  my $sleeptime= 100; # Milliseconds
> +  my $loops= ($opt_start_timeout * 1000) / $sleeptime;
> +
> +  my $name= $mysqld->name();
> +  my $outfile= "$opt_vardir/tmp/$name.wsrep_ready";
> +  my $query= "SET SESSION wsrep_sync_wait = 0;
> +              SELECT VARIABLE_VALUE
> +              FROM INFORMATION_SCHEMA.GLOBAL_STATUS
> +              WHERE VARIABLE_NAME = 'wsrep_ready'";
> +
> +  for (my $loop= 1; $loop <= $loops; $loop++)
> +  {
> +    if (run_query_output($mysqld, $query, $outfile) != 0)
> +    {
> +      $tinfo->{logfile}= "WSREP error while trying to determine node state";
> +      return 0;
> +    }
> +
> +    if (mtr_grab_file($outfile) =~ /^ON/)
> +    {
> +      unlink($outfile);
> +      return 1;
> +    }
> +
> +    mtr_milli_sleep($sleeptime);
> +  }
> +
> +  $tinfo->{logfile}= "WSREP did not transition to state READY";
> +  return 0;
> +}
>  
>  sub do_before_run_mysqltest($)
>  {
> @@ -5380,6 +5479,11 @@ sub start_servers($) {
>        $tinfo->{comment}= "Failed to start ".$_->name() . "\n";
>        return 1;
>      }
> +
> +    if (have_wsrep() && !wait_wsrep_ready($tinfo, $_))
> +    {
> +      return 1;
> +    }
>    }
>    return 0;
>  }
> diff --git a/mysql-test/suite/galera/include/kill_galera.inc b/mysql-test/suite/galera/include/kill_galera.inc
> index c61bad8e19d..f95dccf2185 100644
> --- a/mysql-test/suite/galera/include/kill_galera.inc
> +++ b/mysql-test/suite/galera/include/kill_galera.inc
> @@ -18,4 +18,3 @@
>          exit(0);
>  EOF
>  
> ---source include/wait_until_disconnected.inc
> _______________________________________________
> commits mailing list
> commits@xxxxxxxxxxx
> https://lists.askmonty.org/cgi-bin/mailman/listinfo/commits
Regards,
Sergei
Chief Architect MariaDB
and security@xxxxxxxxxxx