maria-discuss team mailing list archive

Thread
Date

Implementing APPROXIMATE_COUNT(*)

To: "maria-discuss@xxxxxxxxxxxxxxxxxxx" <maria-discuss@xxxxxxxxxxxxxxxxxxx>
From: Tomek R <rtomek@xxxxxxxxxxx>
Date: Sun, 5 Jan 2014 11:13:43 -0500
Importance: Normal

Hi,
I have realised that whenever I am doing any query and want to count number of rows:SELECT COUNT(*) FROM atable WHERE <complicated condition>; On large tables (>10e6 rows), when the count is >100e3, it takes over a second... even when all the indexes are set up, etc... and with large counts, I have not been able to go below ~<1 s. 
So I would like to implement APPROXIMATE_COUNT(*) function, which checks, if, for example every 10th, 100th or 1000th row is present in my final output... and based on this statistics, return the approximate number of rows that given query produces. I am using MyISAM engine and would like to do the implementation on it, where the total number of rows is automatically stored. 
Obviously, it would be useful for anybody dealing with these huge aggregate websites, where users do not really need to know if their keyword search returned 1234567890 results or just 1.2e9 results... 
Please, can you advise me how to start this project? I.e. whether it's easy to do in SQL or whether I would have to start looking somewhere in the source code of MariaDB? I have plenty of time, programming/SQL experience, but no experience with maria/mysql source code.
Tomek

Follow ups

Re: Implementing APPROXIMATE_COUNT(*)
From: Sergei Golubchik, 2014-02-19