maria-developers team mailing list archive

Thread
Date
Re: QC patch for review

To: "Oleksandr \"Sanja\" Byelkin" <sanja@xxxxxxxxxxxx>,Davi.Arnaut@xxxxxxx
From: Michael Widenius <monty@xxxxxxxxxxxx>
Date: Fri, 28 Aug 2009 00:35:06 +0300
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx, internals@xxxxxxxxxxxxxxx
In-reply-to: <4A40DC79.7010705@askmonty.org>
Reply-to: monty@xxxxxxxxxxxx
Hi!

This is regarding a bug fix for query cache:
- Bug#39249  Maria: query cache returns out-of-date results
- Bug #41098 Query Cache returns wrong result with concurrent insert

>>>>> "s" == Oleksandr Byelkin <Oleksandr> writes:

s> Hi!
s> Here it is QC patch for inserts (as we discussed) and also there is DBUG
s> problem workaround in the test.

s> +++ sql/handler.h	2009-06-11 11:25:58 +0000
s> @@ -247,6 +247,11 @@
s>  #define HA_CACHE_TBL_NOCACHE     1
s>  #define HA_CACHE_TBL_ASKTRANSACT 2
s>  #define HA_CACHE_TBL_TRANSACT    4
s> +/**
s> +  Non transactional table but insert results visible for other threads
s> +  only on unlock
s> +*/
s> +#define HA_CACHE_TBL_NTRNS_INS2LOCK 8

The above is true for all non transactional engines. To clarify, there
are two types of non transactional engines when it comes to insert:

1) The table is locked at start of insert statement and can't be accessed by
   other threads until unlock.
2) The table is locked at start of insert statement, but other
   threads can read only the original rows until unlock, after which
   all rows are visible (myisam/maria with concurrent insert on)
 
In  both cases, the new rows are visible after unlock, in which case
we shouldn't need the above flag.

As we discussed in Majorca, I think the logic for this patch is wrong.
The current patch works (as I can see) on the following idea:

- mark non-transacitonal tables that are part of an insert
- later in sql_base.cc::close_thread_table() call
  query_cache_invalidate4() to invalidate the table if there was a
  possible 'concurrent insert'.

The reason the above doesn't work is that another thread may
have done a lock on the table before the unlock happens in the
concurrent-inserted thread and then call 'query_cache_store_query()'
after the invalidate is done. In this case the other thread has
and old view of the table and anything it stores in the cache will
be 'old'.

The new code only makes wrong things 'less unlikely' to happen; It
doesn't remove the original bug.

The reason for the problem with concurrent insert problem in the query
cache are two:

- The query cache should get invalidated at once when table changes
  state (ie, when the new table content is visible for another
  thread).  This should be done under a mutex to ensure that there is
  no race condition for this table.
- When validating of select with a table can be put into the query
  cache we should ensure that it's content is of the latest version.

With the two above axioms in place, the query cache will be safe for
concurrent inserts.  There are some other benefits of changing to use
this model:

- We can safely use locked tables with the query cache.
  - We can put the query into the table cache even under lock tables,
    as long as all the tables are marked as 'cacheable'.
  - We can use the data from the query cache, as long as all locked
    tables are marked as 'cacheable'.

- We could (later) merge the code of transactional and not
  transactional tables as they act the same.

- We can provide caching for versioned tables (like Maria) more easily.

Here is a last comment about the problem with this patch:

s> === modified file 'sql/sql_base.cc'
s> --- sql/sql_base.cc	2009-05-19 09:28:05 +0000
s> +++ sql/sql_base.cc	2009-06-11 11:25:58 +0000
s> @@ -1373,6 +1373,15 @@ bool close_thread_table(THD *thd, TABLE 
s>                          table->s->table_name.str, (long) table));
 
s>    *table_ptr=table->next;
s> +
s> +  /* Invalidate if it has mark about changing in insert
s> +    (not all tables has such marks */

Fix comment style and add missing brace

+  if (table->changed_in_insert)
+  {
+    table->changed_in_insert= FALSE;
+    query_cache_invalidate4(thd, table, FALSE, FALSE);
+  }

I have some problems with the above code:

- The real problem is that if someone accesses the table between
  the invalidate() and unlock, they see the old version of the table
  and not the real rows.
  The above change adds an extra invalidate in case of concurrent
  inserts.  However, another thread may already have done the
  following before this call:

  - Do a query just after invalidate, in which case the table is seen
    as 'old'
  - Do another query after unlock, in which case table is seen as
    'new'.

- Another problem is that another thread that did a lock table
  before the invalidate, it will still see the old data from the table.
  If this thread does a query of the table and it goes into the query
  cache, then the query cache will be filled with wrong data.
  (Note that even if we don't put tables in the cache under lock
   tables, the problem is still the same for tables without lock
   tables. It's just easier to simulate the problem if we test things
  with lock tables)
 
- A future problem is that for nontransactional tables that could do
  concurrent deletes/updates the above wouldn't work.

So this code only makes the problem less likely, it doesn't fix it.

----------------------

Add to MyISAM a function:

virtual my_bool register_query_cache_table(...)
{
  /* check if this thread is seeing all rows in the file */
  if (file->s->state.data_file_length !=
      file->state->data_file_length)
  {
    /*
       Some thread (or even this thread) has inserted more rows into the
       table since it was locked; Don't use the query cache.
    */      
    return 0;
  }
 return 1;
}


We also need to invalidate tables when they change. This should be
done by adding a callback in mi_locking.c:mi_update_status()
to invalidate the query cache for any entries for this table.
(This only needs to be done inside 'if (info->state == &info->save_state)'


In Maira we need to to the same changes in _ma_update_status().

In Maria, we can also decide to handle transactional/versioned tables with the 
query cache:

virtual my_bool register_query_cache_table(....)
{

....

  if (share->now_transactional && share->have_versioning)
    return (file->trn != file->s->state.last_change_trn);
...
}


For the above to work, we need to set last_change_trn in
_ma_trnman_end_trans_hook() when we store the state for a changed
table (at same place where we update 'records' and 'checksum')

Regards,
Monty

PS: Don't forget to also fix the MERGE tables and check that the Maria
    bug is truly fixed.