← Back to team overview

maria-developers team mailing list archive

Re: implementing index condition pushdown in MariaDB 5.5


Hi Zardosht,

On Tue, Feb 12, 2013 at 11:29:12PM -0500, Zardosht Kasheff wrote:
> Is there any documentation for what a storage engine needs to do to
> implement index condition pushdown in MariaDB 5.5? I see some related
> things, such as handler_index_cond_check, HA_DO_INDEX_COND_PUSHDOWN,
> and idx_cond_push, but I don't understand how these all interact with
> each other.
There is no documentation that I am aware of. Basic considerations are

== Interface ==
h->index_flags() must return HA_DO_INDEX_COND_PUSHDOWN flag. Otherwise,
SQL layer will not attempt to do condition pushdown.

The primary functions to overload are:

  Item *handler::idx_cond_push(uint keyno_arg, Item* idx_cond_arg)
  void handler::cancel_pushed_idx_cond()

SQL layer will extract a part of WHERE condition that refers to an index (it
may also have references to fields in other tables) and pass it as parameter
to idx_cond_push().

idx_cond_push() returns the "remainder", i.e. a part of the passed condition
that it is not able to check. In particular

- returning NULL means that the condition was completely pushed down, and 
  storage engine will only return records for which idx_cond_arg evaluates to

- returning idx_cond_arg back means that the engine was unable to push the
  index condition. SQL layer will need to check idx_cond_arg on its own.

cancel_pushed_idx_cond() cancels the effect of idx_cond_push().
idx_cond_push() is called before index_init() call, and pushed condition should
survive  index_init()/index_end() calls, as long as the used index is the same
as keyno_arg argument of idx_cond_push().

The SQL layer will not attempt to have multiple pushed index conditions for
multiple indexes.

This is it about the interface.

== Implementation ==
ICP implementations we have at the moment are all similar, and share some code.
It is useful but not mandatory.

handler_index_cond_check() is an utility function that one may call from the
storage engine (after having unpacked index columns to their places in 
table->record[0]) to check the index condition and do some related things:

1. increment ICP Handler_xxx counters
2. Check if the query was killed (useful, when the query is showeling 
   through lots of records that don't match the index condition. It's nice
   to be able to KILL the query)
3. Check if we've ran out of range.

#3 is critical. Suppose you're doing a range scan on a range, 
   t.key BETWEEN 'bar' and 'foo'.

MariaDB (and MySQL) will not remove the above predicate from the WHERE clause.
The predicate uses index columns, so it will be pushed down as index condition.

MySQL will scan the range with these calls:

 h->read_range_first('bar', 'foo')
 while (h->read_range_next() != HA_ERR...)
   emit a row;

Most storage engines do not implement read_range_next(), they rely on
implementation in handler::read_range_next().  Which calls h->index_next() and
then returns EOF if the returned record is out of range we're scanning.

But what if we make h->index_next() to perform index condition pushdown checks?

The following can happen:
h->index_next() sees index record 'fop' (which is greater than 'foo'). It
checks index condition of  "t.key BETWEEN 'bar' and 'foo'", finds it to be
FALSE, and proceeds to read the next index entry.  The next index entry is
greater (or equal) than 'fop', so it will proceed further until the end of the

To prevent this, handler_index_cond_check() checks whether the retrieved index
record is out of range that is being scanned (and returns ICP_OUT_OF_RANGE if
that is the case).

These were the primary points, let me know if there is something not clear

Sergei Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog

Follow ups