← Back to team overview

maria-developers team mailing list archive

Re: implementing index condition pushdown in MariaDB 5.5

 

Hello Sergei,

Thanks for the feedback. What makes this feature a little difficult
for me to grasp is the requirement that the engine is responsible for
doing some of the condition checking, and that MySQL is doing none of
the checking.

Here are the handler cursor API functions:

index_first
index_last
index_next
index_prev
index_next_same
index_read

If we return NULL for idx_cond_push, meaning we will do all of the
condition checking, does that mean ALL of these functions need to
check the condition before returning something to MySQL? Or just a
subset?

Also, suppose we always return the full condition to MySQL, meaning
the engine is responsible for none of the condition checking, but then
the engine does condition checking anyway. Is that ok? Here is an
example of what I mean. Suppose we have a key (a,b), and a query of
select * from table where a between 5 and 10 and b=5; Would it be ok
for our engine to return the full condition to MySQL, but then in
subsequent calls to index_next and index_prev, choose to not return
rows where b != 5?

Thanks
-Zardosht

On Wed, Feb 13, 2013 at 5:07 AM, Sergei Petrunia <psergey@xxxxxxxxxxxx> wrote:
> Hi Zardosht,
>
> On Tue, Feb 12, 2013 at 11:29:12PM -0500, Zardosht Kasheff wrote:
>> Is there any documentation for what a storage engine needs to do to
>> implement index condition pushdown in MariaDB 5.5? I see some related
>> things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN,
>> and idx_cond_push, but I don't understand how these all interact with
>> each other.
>>
> There is no documentation that I am aware of. Basic considerations are
>
> == Interface ==
> h->index_flags() must return HA_DO_INDEX_COND_PUSHDOWN flag. Otherwise,
> SQL layer will not attempt to do condition pushdown.
>
> The primary functions to overload are:
>
>   Item *handler::idx_cond_push(uint keyno_arg, Item* idx_cond_arg)
>   void handler::cancel_pushed_idx_cond()
>
> SQL layer will extract a part of WHERE condition that refers to an index (it
> may also have references to fields in other tables) and pass it as parameter
> to idx_cond_push().
>
> idx_cond_push() returns the "remainder", i.e. a part of the passed condition
> that it is not able to check. In particular
>
> - returning NULL means that the condition was completely pushed down, and
>   storage engine will only return records for which idx_cond_arg evaluates to
>   TRUE.
>
> - returning idx_cond_arg back means that the engine was unable to push the
>   index condition. SQL layer will need to check idx_cond_arg on its own.
>
> cancel_pushed_idx_cond() cancels the effect of idx_cond_push().
> idx_cond_push() is called before index_init() call, and pushed condition should
> survive  index_init()/index_end() calls, as long as the used index is the same
> as keyno_arg argument of idx_cond_push().
>
> The SQL layer will not attempt to have multiple pushed index conditions for
> multiple indexes.
>
> This is it about the interface.
>
> == Implementation ==
> ICP implementations we have at the moment are all similar, and share some code.
> It is useful but not mandatory.
>
> handler_index_cond_check() is an utility function that one may call from the
> storage engine (after having unpacked index columns to their places in
> table->record[0]) to check the index condition and do some related things:
>
> 1. increment ICP Handler_xxx counters
> 2. Check if the query was killed (useful, when the query is showeling
>    through lots of records that don't match the index condition. It's nice
>    to be able to KILL the query)
> 3. Check if we've ran out of range.
>
> #3 is critical. Suppose you're doing a range scan on a range,
>
>    t.key BETWEEN 'bar' and 'foo'.
>
> MariaDB (and MySQL) will not remove the above predicate from the WHERE clause.
> The predicate uses index columns, so it will be pushed down as index condition.
>
> MySQL will scan the range with these calls:
>
>  h->read_range_first('bar', 'foo')
>  while (h->read_range_next() != HA_ERR...)
>  {
>    emit a row;
>  }
>
> Most storage engines do not implement read_range_next(), they rely on
> implementation in handler::read_range_next().  Which calls h->index_next() and
> then returns EOF if the returned record is out of range we're scanning.
>
> But what if we make h->index_next() to perform index condition pushdown checks?
>
> The following can happen:
> h->index_next() sees index record 'fop' (which is greater than 'foo'). It
> checks index condition of  "t.key BETWEEN 'bar' and 'foo'", finds it to be
> FALSE, and proceeds to read the next index entry.  The next index entry is
> greater (or equal) than 'fop', so it will proceed further until the end of the
> index.
>
> To prevent this, handler_index_cond_check() checks whether the retrieved index
> record is out of range that is being scanned (and returns ICP_OUT_OF_RANGE if
> that is the case).
>
> These were the primary points, let me know if there is something not clear
> yet.
>
> BR
>  Sergei
> --
> Sergei Petrunia, Software Developer
> Monty Program AB, http://askmonty.org
> Blog: http://s.petrunia.net/blog


Follow ups

References