← Back to team overview

maria-developers team mailing list archive

Performance problem: need index-based access when have only non-sargable conditions



I was asked about this performance problem: consider a query 

  select * from tbl where tbl.key like '%foo%'
Table records are big, much larger than the length of the key. LIKE condition
is very selective, however its pattern starts with '%', so we can't construct
range access (and they are really searching for matches in word
suffixes/infixes, so fulltext index isn't an option).

The desired execution strategy is:

- Scan the index on tbl.key and collect index tuples that match the LIKE 
- For those index tuples, retrieve full records.

One can simulate it by rewriting the query as a self join:

  select * from tbl A, tbl B 
    B.primary_key = A.primary_key AND B.key LIKE '%foo%';

However, in this particular case they have a problem doing such rewrites
because the queries are generated by some ORM tool.

Proposed solution from our side
It is not difficult to add an execution strategy like this

    read index tuple;
    if (condition satisified)
      read full record
(with optional rowid-ordered retrieval for full records). 

The difficult part is to find a way to decided whether this strategy should be
used.  We have no meaningful selectivity estimates for 'column LIKE '%foo%'.

The only way we could achieve this is by having the user manually specify
condition selectivities. I can see one way to achieve this without getting into
painful syntax modifications: 

  select * from tbl where selectivity(tbl.key like '%foo%', 0.05)

here 'selectivity(X,y)' will evaluate to value of X, but will also inform the
optimizer that P(X is true)=y.

Any thoughts about this?

Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog

Follow ups