← Back to team overview

maria-developers team mailing list archive

Possible race condition or NULL pointer triggered by OQGraph

 

Hi all

(Cross-posted to oqgraph-developers, maria-developers)

I am trying to track down a segfault apparently triggered by concurrent
execution of queries through OQGraph - for context, see:

https://mariadb.atlassian.net/browse/MDEV-6282

I am however a bit confused as to what is going on, because at least some of
the information leads me to suspect that there could be more going on here
than just a simple race condition.

--Immediate Cause--

On the surface, a segfault appears to be caused in a method in sql_class.h,
Statement::check_limit_rows_examined() dereferencing a NULL pointer, `lex`.
The method check_limit_rows_examined() was called from sql_class.h method
handler::increment_statistics() .

--Analysis--

A quick inspection reveals that a NULL value for `lex` is actually perfectly
valid some of the time, although I am nowhere near familiar enough with the
code to really understand when or why.  Specifically, `lex` is a member of the
base class Statement, which is initialised to either point to THD::main_lex
via a constructor , or intentionally to NULL from construction of an object
Prepared_statement() (sql_prepare.cc)

To my mind, if there is going to be a possibility of a pointer being NULL,
then all uses of that pointer should be checking for that.

I am hoping that the bug reporter can prove or otherwise my hypothesis with
his particular scenario, although I am suspicious that there may be additional
issues to be dealt with; it is quite possible that lex being NULL is a red
herring, being caused by a race condition memory overrun elsewhere. However, I
think it is still a good idea to NULL check pointers that could be set to NULL
intentionally.

I can submit a pull request if the following patch is agreeable:

    diff --git a/sql/sql_class.h b/sql/sql_class.h
    index 5898d9e..957acf3 100644
    --- a/sql/sql_class.h
    +++ b/sql/sql_class.h
    @@ -2475,8 +2475,10 @@ public:
       */
       void check_limit_rows_examined()
       {
    -    if (++accessed_rows_and_keys > lex->limit_rows_examined_cnt)
    +    ++accessed_rows_and_keys;
    +    if (lex && accessed_rows_and_keys > lex->limit_rows_examined_cnt) {
           killed= ABORT_QUERY;
    +    }
       }



--Aside--

Immediately prior to the check_limit_rows_examined() call is a call to macro
status_var_increment().  Comments around the macro indicate it is often likely
to be called in a mutli-threaded context, but the data is such that it is
possible to disable protection with the only side effect being an occasionally
'inaccurate' result. Is it possible however, that check_limit_rows_examined()
is not thread-safe and should have some protection?

cheers,
Andrew
--A




Follow ups