← Back to team overview

maria-discuss team mailing list archive

Re: MyISAM: single table GROUP BY plan changes based on LIMIT. BUG?

 

Hi again,

I think confirmed that is the same issue. I have updated the bug https://jira.mariadb.org/browse/MDEV-26552

Now the bug has two reproducer scripts attached

a) for the index creation
b) for the faulty group by in filesort mode

My hunch says that is the same bug because in filesort mode mariadb creates first a sort-index and that is the part that fails probably.

Both reproduces need a table with more than 390136719 rows. (390039063 works). Maybe the limit is 390070272 (372 * 1024 * 1024)

I tested in 10.6 and both scripts fail to reproduce the bugs. So 10.6.4 is good 10.5.12 is bad.

Vassilis


On 10/12/21 21:42, Vassilis Virvilis wrote:
Hi Sergei,

Thanks for the detailed and insightful answer.

I get it now. So I could invoke the index by FORCE INDEX. Interesting...

MDEV-8306 looks interesting!

The real problem I had is that I get correct results when index is utilized and wrong with filesort.

My table is more than 500M rows so I have no easy reproducer.

I had reported something similar here https://jira.mariadb.org/browse/MDEV-26552 but I don't know if this is related to the above behavior.

Anyway I will try to create a solid reproducer when I find some time.

Thanks again.

     Vassilis


On 10/12/21 7:32 PM, Sergei Golubchik wrote:
Hi, Vassilis!

On Oct 12, Vassilis Virvilis wrote:

I managed to create a trivial reproducer with a 10 rows table.

If I don't specify LIMIT the plan goes to filesort.
If I specify LIMIT <= 9 the plan goes to utilize the index
If I specify LIMIT >= 10 (table rows) the plan foes to filesort.

Is this behavior expected? Do you think I should report it?

Yes, expected. If you specify a LIMIT with more rows than what you have
in the table it's as if you didn't specify any limit at all. So the plan
is the same as with no LIMIT.

If you do specify a LIMIT (that actually limits the output) then
optimizer favors the index over filesort. It's a simple heuristics, it
assumes the limit is small and going through the index is much faster in
this case. If you specify a large limit this heuristics might be wrong.

A proper cost based approach was developed in
https://jira.mariadb.org/browse/MDEV-8306
But it's not in any release yet.

Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx



_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp



Follow ups

References