← Back to team overview

maria-developers team mailing list archive

Re: Deadlock with STOP SLAVE and LOCK_active_mi

 

Michael Widenius <michael.widenius@xxxxxxxxx> writes:

>> 2. To protect access to master_info_index, eg. to prevent a Master_info from
>> disappearing while it is accessed.
>
> When reading about this, I think it would be better to have a counter
> in Master_info if someone is using it and only delete if if the
> counter is zero. Would be trivial to add an increment of the counter
> in get_master_info().
>
> The counter could either be protected by the LOCK_active_mi or one of
> the mutexes in Master_info like data_lock.

Sounds reasonable. Will you do a patch? I guess this is code from the
multisource feature, so you are the one familiar with it (before
multi-source, the Master_info could not be created and deleted dynamically,
could it?)

>> Both LOCK_active_mi and the new LOCK_serialize_replication_admin_commands
>> must by themselves prevent master_info_index from having elements added or
>> removed. This is needed for start_all_slaves() and stop_all_slaves() to work
>> correctly, I think.
>
> Why both?
> Isn't it enough to first take LOCK_serialize_replication_admin_commands and
> then take LOCK_active_mi if one needs to access master_info_index ?

In stop_all_slaves(), there is a loop over the master_info_hash, within
which slaves are stopped one at a time. So LOCK_active_mi cannot be held
across the loop because of the deadlock discussed in this thread. So it
seems that LOCK_serialize_replication_admin_commands must prevent elements
being added or removed, to protect this loop.

But LOCK_active_mi also needs to protect elements from going away, so that
they can be safely accessed in the many different places in the code that do
this.

Of course there are a number of different ways this could be solved - just
pointing out that it is something that need to be handled one way or the
other.

>> Actually, there would still be the deadlock problem after introducing the
>> new lock, because it is possible for a slave thread to run CHANGE MASTER!
>> (and I think START SLAVE / RESET SLAVE as well). But that is probably a bug,
>> does not seem good that this is possible. I guess these need a check to
>> throw an ER_LOCK_OR_ACTIVE_TRANSACTION error, like STOP SLAVE.
>
> Wouldn't the new lock LOCK_serialize_replication_admin_commands fix that?

How? It's exactly the same deadlock, no matter what we call the lock. STOP
SLAVE holds lock, waits for sql thread to stop. SQL thread requests lock
(eg. in CHANGE MASTER), deadlocking.

But really, there is no reason a slave thread should be allowed to run
replication adminitrative commands, it should just be an error so that the
slave threads will never try to take
LOCK_serialize_replication_admin_commands. Maybe that is what you meant.

> I checked the function mysql_show_binlog_events() but could not find
> any access to
> Master_info after mysql_mutex_unlock(&LOCK_active_mi) was released.

Hm, no, it doesn't seem so, I guess I misread the code. So it is only
MASTER_POS_WAIT() that has the problem, then.

 - Kristian.


Follow ups

References