← Back to team overview

maria-developers team mailing list archive

Re: my_hash_sort_bin

 

Hi, Mark!

On Feb 19, MARK CALLAGHAN wrote:
> 
> we realized that the hash function used in my_hash_sort_bin is lousy for
> this input: test.sbest1, test.sbtest2, ..., test.sbtest10. The problem is
> made worse when a small number of hash buckets is used because the hash
> function output doesn't do the right thing for the least significant bits
> so that all 10 inputs map to the same hash bucket. More details are at
> http://bugs.mysql.com/bug.php?id=66473
> 
> The InnoDB hash function is much better. Details for that and a test
> program are in the bug report. Does anyone remember why this hash function
> was chosen?
> 
> strings/ctype-bin.c doesn't have any comments explaining why this hash
> function was selected. This is another peeve for me. Critical code like
> this should be explained if we expect anyone new to begin working on this
> code.

This is mysql hash function from the dawn of time. Before charset code
it existed in the mysys/hash.c and I found it unchanged in as early as
mysql-3.20.13 (it's 1997).

Regards,
Sergei


References