maria-developers team mailing list archive
Mailing list archive
Re: implementing index condition pushdown in MariaDB 5.5
Also, can somebody please explain how handler_index_cond_check checks
index conditions? The key function seems to be item->val_int. How does
this get each value and check check conditions?
On Thu, Feb 14, 2013 at 11:31 AM, Zardosht Kasheff <zardosht@xxxxxxxxx> wrote:
> Hello Sergei,
> Thanks for the feedback. What makes this feature a little difficult
> for me to grasp is the requirement that the engine is responsible for
> doing some of the condition checking, and that MySQL is doing none of
> the checking.
> Here are the handler cursor API functions:
> If we return NULL for idx_cond_push, meaning we will do all of the
> condition checking, does that mean ALL of these functions need to
> check the condition before returning something to MySQL? Or just a
> Also, suppose we always return the full condition to MySQL, meaning
> the engine is responsible for none of the condition checking, but then
> the engine does condition checking anyway. Is that ok? Here is an
> example of what I mean. Suppose we have a key (a,b), and a query of
> select * from table where a between 5 and 10 and b=5; Would it be ok
> for our engine to return the full condition to MySQL, but then in
> subsequent calls to index_next and index_prev, choose to not return
> rows where b != 5?
> On Wed, Feb 13, 2013 at 5:07 AM, Sergei Petrunia <psergey@xxxxxxxxxxxx> wrote:
>> Hi Zardosht,
>> On Tue, Feb 12, 2013 at 11:29:12PM -0500, Zardosht Kasheff wrote:
>>> Is there any documentation for what a storage engine needs to do to
>>> implement index condition pushdown in MariaDB 5.5? I see some related
>>> things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN,
>>> and idx_cond_push, but I don't understand how these all interact with
>>> each other.
>> There is no documentation that I am aware of. Basic considerations are
>> == Interface ==
>> h->index_flags() must return HA_DO_INDEX_COND_PUSHDOWN flag. Otherwise,
>> SQL layer will not attempt to do condition pushdown.
>> The primary functions to overload are:
>> Item *handler::idx_cond_push(uint keyno_arg, Item* idx_cond_arg)
>> void handler::cancel_pushed_idx_cond()
>> SQL layer will extract a part of WHERE condition that refers to an index (it
>> may also have references to fields in other tables) and pass it as parameter
>> to idx_cond_push().
>> idx_cond_push() returns the "remainder", i.e. a part of the passed condition
>> that it is not able to check. In particular
>> - returning NULL means that the condition was completely pushed down, and
>> storage engine will only return records for which idx_cond_arg evaluates to
>> - returning idx_cond_arg back means that the engine was unable to push the
>> index condition. SQL layer will need to check idx_cond_arg on its own.
>> cancel_pushed_idx_cond() cancels the effect of idx_cond_push().
>> idx_cond_push() is called before index_init() call, and pushed condition should
>> survive index_init()/index_end() calls, as long as the used index is the same
>> as keyno_arg argument of idx_cond_push().
>> The SQL layer will not attempt to have multiple pushed index conditions for
>> multiple indexes.
>> This is it about the interface.
>> == Implementation ==
>> ICP implementations we have at the moment are all similar, and share some code.
>> It is useful but not mandatory.
>> handler_index_cond_check() is an utility function that one may call from the
>> storage engine (after having unpacked index columns to their places in
>> table->record) to check the index condition and do some related things:
>> 1. increment ICP Handler_xxx counters
>> 2. Check if the query was killed (useful, when the query is showeling
>> through lots of records that don't match the index condition. It's nice
>> to be able to KILL the query)
>> 3. Check if we've ran out of range.
>> #3 is critical. Suppose you're doing a range scan on a range,
>> t.key BETWEEN 'bar' and 'foo'.
>> MariaDB (and MySQL) will not remove the above predicate from the WHERE clause.
>> The predicate uses index columns, so it will be pushed down as index condition.
>> MySQL will scan the range with these calls:
>> h->read_range_first('bar', 'foo')
>> while (h->read_range_next() != HA_ERR...)
>> emit a row;
>> Most storage engines do not implement read_range_next(), they rely on
>> implementation in handler::read_range_next(). Which calls h->index_next() and
>> then returns EOF if the returned record is out of range we're scanning.
>> But what if we make h->index_next() to perform index condition pushdown checks?
>> The following can happen:
>> h->index_next() sees index record 'fop' (which is greater than 'foo'). It
>> checks index condition of "t.key BETWEEN 'bar' and 'foo'", finds it to be
>> FALSE, and proceeds to read the next index entry. The next index entry is
>> greater (or equal) than 'fop', so it will proceed further until the end of the
>> To prevent this, handler_index_cond_check() checks whether the retrieved index
>> record is out of range that is being scanned (and returns ICP_OUT_OF_RANGE if
>> that is the case).
>> These were the primary points, let me know if there is something not clear
>> Sergei Petrunia, Software Developer
>> Monty Program AB, http://askmonty.org
>> Blog: http://s.petrunia.net/blog