← Back to team overview

maria-developers team mailing list archive

Performance of index intersection

 

Hi!

I've been using MySQL/MariaDB for two decades but have more recently been
working with Elasticsearch.  I knew to expect an inverted index to speed up
querying full text fields, but I've been surprised (and a bit annoyed) at
how fast ES can query structured data.  (In my case, I'm largely looking
for intersections of a number of varchar fields with lowish cardinality,
e.g. WHERE country = 'US' AND client = 'Microsoft' AND status =
'Completed'.)

Elasticsearch seems to have several things going on, but I think a core
aspect, to use RDMS terminology, is that each column is indexed, and index
unions/intersections are used if the WHERE clause references multiple
columns.

I've heard that MySQL/MariaDB has the ability to merge indexes, but I've
rarely observed it in person.  Googling for it yields a bunch of
StackOverflow posts complaining how slow it is, with responses agreeing and
explaining how to disable it.

If I'm reading the MySQL/MariaDB code correctly, it looks like MariaDB will
intersect indexes by looping through each index, reading the rowids of all
matching keys and then, at the end (or once the buffer is full), checking
whether each rowid is present in each index.

I wonder if there's an opportunity to speed this up.  If we first read in
the rowids from one index (ideally the one with the fewest matches), we
could tell the storage engine that, when reading the next index, it should
skip over rowids lower than the next candidate intersection.  In the best
case scenario, I think this could enable InnoDB to use its page directory
to skip past some of the keys, improving the performance from O(n) to O(log
n).

That said, this is all new to me.  Maybe there's an obvious reason this
wouldn't make much of an improvement, or maybe I've overlooked that it's
already been done.  However, if it looks promising to you folk, and it's
something you'd consider merging, I'd be willing to attempt writing a PR
for it.

Thank you,
David Sickmiller

Follow ups