← Back to team overview

maria-developers team mailing list archive

Re: Deadlock with STOP SLAVE and LOCK_active_mi


Hi, Kristian!

On Apr 15, Kristian Nielsen wrote:
> So while looking at MDEV-9573, I found this code in
> mysql_execute_command() for STOP SLAVE:
>     mysql_mutex_lock(&LOCK_active_mi);
>     if ((mi= (master_info_index->
>               get_master_info(&lex_mi->connection_name,
>                               Sql_condition::WARN_LEVEL_ERROR))))
>       if (!stop_slave(thd, mi, 1/* net report*/))
>         my_ok(thd);
>     mysql_mutex_unlock(&LOCK_active_mi);
> So basically, this code is holding LOCK_active_mi for the entire
> duration of the STOP SLAVE operation.
> This seems completely broken. Anything in a slave thread that tries to
> take LOCK_active_mi will then deadlock with the STOP SLAVE operation.
> It is simple enough to trigger, testcase (with sleep-injecting patch)
> at the end of the email. This uses INFORMATION_SCHEMA.SESSSION_STATUS;
> I could imagine accessing a number of system variables could trigger
> it as well.
> From a quick check, it looks like this has been like this forever, eg.
> 5.1 has similar code.
> Any suggestions for how this is supposed to work? Or is it just broken
> by design, but saved because normally slave threads do not need to
> access SHOW STATUS or system variables?

These locks were added in this commit:

  commit f808030
  Author: unknown <guilhem@xxxxxxxxx>
  Date:   Thu Mar 11 16:23:35 2004 +0100

      Fix for BUG#2921 "Replication problem on mutex lock in mySQL-4.0.18":
      re-using unused LOCK_active_mi to serialize all administrative
      commands related to replication:
      (replication autostart at server startup), end_slave() (replication
      autostop at server shutdown), LOAD DATA FROM MASTER.
      This protects us against a handful of deadlocks (like BUG#2921
      when two START SLAVE, but when two STOP SLAVE too).
      Removing unused variables.

So, apparently LOCK_active_mi was unused at that time and Guilhem took
it to serialize replication administrative commands. Now this lock is
used. The fix, I suppose, should be to create a new lock and use it
here. Something like LOCK_serialize_replication_admin_commands.

Chief Architect MariaDB
and security@xxxxxxxxxxx

Follow ups