maria-discuss team mailing list archive
Mailing list archive
Re: Implementing APPROXIMATE_COUNT(*)
On Jan 05, Tomek R wrote:
> I have realised that whenever I am doing any query and want to count
> number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>;
> On large tables (>10e6 rows), when the count is >100e3, it takes over
> a second... even when all the indexes are set up, etc... and with
> large counts, I have not been able to go below ~<1 s.
> So I would like to implement APPROXIMATE_COUNT(*) function, which
> checks, if, for example every 10th, 100th or 1000th row is present in
> my final output... and based on this statistics, return the
> approximate number of rows that given query produces. I am using
> MyISAM engine and would like to do the implementation on it, where the
> total number of rows is automatically stored.
> Obviously, it would be useful for anybody dealing with these huge
> aggregate websites, where users do not really need to know if their
> keyword search returned 1234567890 results or just 1.2e9 results...
> Please, can you advise me how to start this project? I.e. whether it's
> easy to do in SQL or whether I would have to start looking somewhere
> in the source code of MariaDB? I have plenty of time, programming/SQL
> experience, but no experience with maria/mysql source code.
See the new server variable in MariaDB-10.0 -
optimizer_use_condition_selectivity - and in particular, when happens
when it's set to 5. The manual documents it as
5 Additionally use selectivity of certain non-range predicates
calculated on record sample.
It means that to estimate the selectivity of the WHERE condition for a
specific table, exactly, SELECT ... FROM atable WHERE <complicated condition>,
optimizer reads few first rows of the table (as set in
optimizer_selectivity_sampling_limit) and checks how many of them pass
the WHERE condition.
This is very close to what you want, but currently it only works for
If you'd extend this to work - optionally - for all WHERE conditions,
you'll be able to see your "approximate count" value in the EXPLAIN