← Back to team overview

maria-developers team mailing list archive

Re: Problem with debug_sync, THD::enter_cond(), and kill ...

 

Looking at the code it looks like most places (including THD::awake)
assume that thd->mysys_var->current_mutex and
thd->mysys_var->current_cond can be changed only when
thd->mysys_var->mutex is locked. debug_sync.cc is breaking this
assumption. I've found that Item_func_sleep::val_int(),
THD::enter_cond() and THD::exit_cond() break this assumption too. I
believe all of those places should be fixed to lock
thd->mysys_var->mutex before any attempts to read or write to
thd->mysys_var->current_mutex and thd->mysys_var->current_cond.

Pavel

On Wed, Feb 26, 2014 at 8:29 AM, Kristian Nielsen
<knielsen@xxxxxxxxxxxxxxx> wrote:
> I got a crash in my parallel replication test case, and this one turned out to
> be caused by a fundamental  problem in debug_sync. I am not sure how to solve
> it, so I wanted to explain the issue in case others have some ideas.
>
> The crash happens in THD::awake() when a thread is killed:
>
>     mysql_mutex_lock(&mysys_var->mutex);
>     ...
>         int ret= mysql_mutex_trylock(mysys_var->current_mutex);
>         mysql_cond_broadcast(mysys_var->current_cond);
>         if (!ret)
>           mysql_mutex_unlock(mysys_var->current_mutex);
>
> The problem is that mysys_var->current_mutex changed between the trylock() and
> the unlock(), so we unlock a different mutex than the one we locked - ouch!
>
> The mutex changed because of this code in debug_sync.cc:
>
>       if (thd->mysys_var)
>       {
>         old_mutex= thd->mysys_var->current_mutex;
>         old_cond= thd->mysys_var->current_cond;
>         thd->mysys_var->current_mutex= &debug_sync_global.ds_mutex;
>         thd->mysys_var->current_cond= &debug_sync_global.ds_cond;
>       }
>
> There is no mutex protection here.
>
> So this means that it is not safe to use debug_sync inside
> enter_cond()/exit_cond() if kill will be used on the thread at that point.
> Since I have a lot of tests for parallel replication where I test exactly for
> correct error handling when killing threads at various points in the parallel
> processing, it is not too surprising that I would eventually be hit by this :)
>
> I am going to avoid using debug_sync inside enter_cond()/exit_cond() for now.
> But any ideas for how to solve this properly? This is particularly nasty to be
> hit by, as it is not at all obvious that this use of debug_sync should be a
> problem. And it is not exactly easy to guess from the failure what the real
> problem is - assuming one can even reproduce the failure, the window of
> opportunity in the race is after all rather quite small.
>
>  - Kristian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-developers
> Post to     : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help   : https://help.launchpad.net/ListHelp


Follow ups

References