← Back to team overview

maria-developers team mailing list archive

my_hash_sort_bin

 

Hello MariaDB,

While working with Dmitry Lenev to understand why MySQL 5.6 was slower than
5.1 on one benchmark (
https://www.facebook.com/notes/mysql-at-facebook/my-mysql-is-faster-than-your-mysql/10151250402570933)
we realized that the hash function used in my_hash_sort_bin is lousy for
this input: test.sbest1, test.sbtest2, ..., test.sbtest10. The problem is
made worse when a small number of hash buckets is used because the hash
function output doesn't do the right thing for the least significant bits
so that all 10 inputs map to the same hash bucket. More details are at
http://bugs.mysql.com/bug.php?id=66473

The InnoDB hash function is much better. Details for that and a test
program are in the bug report. Does anyone remember why this hash function
was chosen?

strings/ctype-bin.c doesn't have any comments explaining why this hash
function was selected. This is another peeve for me. Critical code like
this should be explained if we expect anyone new to begin working on this
code.

I haven't done a formal test of the hash function, and
http://burtleburtle.net/bob/hash/hashfaq.html would be a good reference for
that. But I think it is very likely that my assertion that this hash
function is lousy holds in general, rather than is limited to this input.
And given the large number of machines that run MySQL, MariaDB and Percona
Server (and the FB patch) around the world I suspect that we should figure
this out.

-- 
Mark Callaghan
mdcallag@xxxxxxxxx

Follow ups