← Back to team overview

maria-discuss team mailing list archive

Re: Implementing APPROXIMATE_COUNT(*)



Another option is using an external fastbit index, which will give you more
than count(*) and work excellently over complex ranges.

On Wed, Feb 19, 2014 at 5:25 AM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Tomek!
> On Jan 05, Tomek R wrote:
> > Hi,
> > I have realised that whenever I am doing any query and want to count
> > number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>;
> > On large tables (>10e6 rows), when the count is >100e3, it takes over
> > a second... even when all the indexes are set up, etc...  and with
> > large counts, I have not been able to go below ~<1 s.
> > So I would like to implement APPROXIMATE_COUNT(*) function, which
> > checks, if, for example every 10th, 100th or 1000th row is present in
> > my final output... and based on this statistics, return the
> > approximate number of rows that given query produces. I am using
> > MyISAM engine and would like to do the implementation on it, where the
> > total number of rows is automatically stored.
> > Obviously, it would be useful for anybody dealing with these huge
> > aggregate websites, where users do not really need to know if their
> > keyword search returned 1234567890 results or just 1.2e9 results...
> > Please, can you advise me how to start this project? I.e. whether it's
> > easy to do in SQL or whether I would have to start looking somewhere
> > in the source code of MariaDB? I have plenty of time, programming/SQL
> > experience, but no experience with maria/mysql source code.
> See the new server variable in MariaDB-10.0 -
> optimizer_use_condition_selectivity - and in particular, when happens
> when it's set to 5. The manual documents it as
>     5     Additionally use selectivity of certain non-range predicates
>           calculated on record sample.
> It means that to estimate the selectivity of the WHERE condition for a
> specific table, exactly, SELECT ... FROM atable WHERE <complicated
> condition>,
> optimizer reads few first rows of the table (as set in
> optimizer_selectivity_sampling_limit) and checks how many of them pass
> the WHERE condition.
> This is very close to what you want, but currently it only works for
> LIKE predicates.
> If you'd extend this to work - optionally - for all WHERE conditions,
> you'll be able to see your "approximate count" value in the EXPLAIN
> output.
> Regards,
> Sergei
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp