maria-developers team mailing list archive
Mailing list archive
Re: Problem with debug_sync, THD::enter_cond(), and kill ...
Looking at the code it looks like most places (including THD::awake)
assume that thd->mysys_var->current_mutex and
thd->mysys_var->current_cond can be changed only when
thd->mysys_var->mutex is locked. debug_sync.cc is breaking this
assumption. I've found that Item_func_sleep::val_int(),
THD::enter_cond() and THD::exit_cond() break this assumption too. I
believe all of those places should be fixed to lock
thd->mysys_var->mutex before any attempts to read or write to
thd->mysys_var->current_mutex and thd->mysys_var->current_cond.
On Wed, Feb 26, 2014 at 8:29 AM, Kristian Nielsen
> I got a crash in my parallel replication test case, and this one turned out to
> be caused by a fundamental problem in debug_sync. I am not sure how to solve
> it, so I wanted to explain the issue in case others have some ideas.
> The crash happens in THD::awake() when a thread is killed:
> int ret= mysql_mutex_trylock(mysys_var->current_mutex);
> if (!ret)
> The problem is that mysys_var->current_mutex changed between the trylock() and
> the unlock(), so we unlock a different mutex than the one we locked - ouch!
> The mutex changed because of this code in debug_sync.cc:
> if (thd->mysys_var)
> old_mutex= thd->mysys_var->current_mutex;
> old_cond= thd->mysys_var->current_cond;
> thd->mysys_var->current_mutex= &debug_sync_global.ds_mutex;
> thd->mysys_var->current_cond= &debug_sync_global.ds_cond;
> There is no mutex protection here.
> So this means that it is not safe to use debug_sync inside
> enter_cond()/exit_cond() if kill will be used on the thread at that point.
> Since I have a lot of tests for parallel replication where I test exactly for
> correct error handling when killing threads at various points in the parallel
> processing, it is not too surprising that I would eventually be hit by this :)
> I am going to avoid using debug_sync inside enter_cond()/exit_cond() for now.
> But any ideas for how to solve this properly? This is particularly nasty to be
> hit by, as it is not at all obvious that this use of debug_sync should be a
> problem. And it is not exactly easy to guess from the failure what the real
> problem is - assuming one can even reproduce the failure, the window of
> opportunity in the race is after all rather quite small.
> - Kristian.
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-developers
> More help : https://help.launchpad.net/ListHelp