← Back to team overview

maria-developers team mailing list archive

Performance problem: need index-based access when have only non-sargable conditions

 

Hi!

I was asked about this performance problem: consider a query 

  select * from tbl where tbl.key like '%foo%'
  
Table records are big, much larger than the length of the key. LIKE condition
is very selective, however its pattern starts with '%', so we can't construct
range access (and they are really searching for matches in word
suffixes/infixes, so fulltext index isn't an option).

The desired execution strategy is:

- Scan the index on tbl.key and collect index tuples that match the LIKE 
  condition.
- For those index tuples, retrieve full records.

One can simulate it by rewriting the query as a self join:

  select * from tbl A, tbl B 
  where 
    B.primary_key = A.primary_key AND B.key LIKE '%foo%';

However, in this particular case they have a problem doing such rewrites
because the queries are generated by some ORM tool.

Proposed solution from our side
-------------------------------
It is not difficult to add an execution strategy like this

  get_next_record()
  {
    read index tuple;
    if (condition satisified)
    {
      read full record
    }
  }
(with optional rowid-ordered retrieval for full records). 

The difficult part is to find a way to decided whether this strategy should be
used.  We have no meaningful selectivity estimates for 'column LIKE '%foo%'.

The only way we could achieve this is by having the user manually specify
condition selectivities. I can see one way to achieve this without getting into
painful syntax modifications: 

  select * from tbl where selectivity(tbl.key like '%foo%', 0.05)

here 'selectivity(X,y)' will evaluate to value of X, but will also inform the
optimizer that P(X is true)=y.

Any thoughts about this?

BR
 Sergey
-- 
Sergey Petrunia, Software Developer
Monty Program AB, http://askmonty.org
Blog: http://s.petrunia.net/blog



Follow ups