maria-discuss team mailing list archive

Thread
Date

Re: Implementing APPROXIMATE_COUNT(*)

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Justin Swanhart <greenlion@xxxxxxxxx>
Date: Wed, 19 Feb 2014 10:19:45 -0800
Cc: Tomek R <rtomek@xxxxxxxxxxx>, "maria-discuss@xxxxxxxxxxxxxxxxxxx" <maria-discuss@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20140219132526.GA28034@meddwl.fritz.box>

Hi,

Another option is using an external fastbit index, which will give you more
than count(*) and work excellently over complex ranges.
https://sdm.lbl.gov/fastbit/


On Wed, Feb 19, 2014 at 5:25 AM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Tomek!
>
> On Jan 05, Tomek R wrote:
> > Hi,
> > I have realised that whenever I am doing any query and want to count
> > number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>;
> > On large tables (>10e6 rows), when the count is >100e3, it takes over
> > a second... even when all the indexes are set up, etc...  and with
> > large counts, I have not been able to go below ~<1 s.
> > So I would like to implement APPROXIMATE_COUNT(*) function, which
> > checks, if, for example every 10th, 100th or 1000th row is present in
> > my final output... and based on this statistics, return the
> > approximate number of rows that given query produces. I am using
> > MyISAM engine and would like to do the implementation on it, where the
> > total number of rows is automatically stored.
> > Obviously, it would be useful for anybody dealing with these huge
> > aggregate websites, where users do not really need to know if their
> > keyword search returned 1234567890 results or just 1.2e9 results...
> > Please, can you advise me how to start this project? I.e. whether it's
> > easy to do in SQL or whether I would have to start looking somewhere
> > in the source code of MariaDB? I have plenty of time, programming/SQL
> > experience, but no experience with maria/mysql source code.
>
> See the new server variable in MariaDB-10.0 -
> optimizer_use_condition_selectivity - and in particular, when happens
> when it's set to 5. The manual documents it as
>
>     5     Additionally use selectivity of certain non-range predicates
>           calculated on record sample.
>
> It means that to estimate the selectivity of the WHERE condition for a
> specific table, exactly, SELECT ... FROM atable WHERE <complicated
> condition>,
> optimizer reads few first rows of the table (as set in
> optimizer_selectivity_sampling_limit) and checks how many of them pass
> the WHERE condition.
>
> This is very close to what you want, but currently it only works for
> LIKE predicates.
>
> If you'd extend this to work - optionally - for all WHERE conditions,
> you'll be able to see your "approximate count" value in the EXPLAIN
> output.
>
> Regards,
> Sergei
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>

References

Implementing APPROXIMATE_COUNT(*)
From: Tomek R, 2014-01-05
Re: Implementing APPROXIMATE_COUNT(*)
From: Sergei Golubchik, 2014-02-19