← Back to team overview

maria-discuss team mailing list archive

Re: Implementing APPROXIMATE_COUNT(*)

 

Hi, Tomek!

On Jan 05, Tomek R wrote:
> Hi,
> I have realised that whenever I am doing any query and want to count
> number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>;
> On large tables (>10e6 rows), when the count is >100e3, it takes over
> a second... even when all the indexes are set up, etc...  and with
> large counts, I have not been able to go below ~<1 s. 
> So I would like to implement APPROXIMATE_COUNT(*) function, which
> checks, if, for example every 10th, 100th or 1000th row is present in
> my final output... and based on this statistics, return the
> approximate number of rows that given query produces. I am using
> MyISAM engine and would like to do the implementation on it, where the
> total number of rows is automatically stored. 
> Obviously, it would be useful for anybody dealing with these huge
> aggregate websites, where users do not really need to know if their
> keyword search returned 1234567890 results or just 1.2e9 results... 
> Please, can you advise me how to start this project? I.e. whether it's
> easy to do in SQL or whether I would have to start looking somewhere
> in the source code of MariaDB? I have plenty of time, programming/SQL
> experience, but no experience with maria/mysql source code.

See the new server variable in MariaDB-10.0 -
optimizer_use_condition_selectivity - and in particular, when happens
when it's set to 5. The manual documents it as

    5     Additionally use selectivity of certain non-range predicates
          calculated on record sample. 

It means that to estimate the selectivity of the WHERE condition for a
specific table, exactly, SELECT ... FROM atable WHERE <complicated condition>,
optimizer reads few first rows of the table (as set in
optimizer_selectivity_sampling_limit) and checks how many of them pass
the WHERE condition.

This is very close to what you want, but currently it only works for
LIKE predicates.

If you'd extend this to work - optionally - for all WHERE conditions,
you'll be able to see your "approximate count" value in the EXPLAIN
output.

Regards,
Sergei



Follow ups

References