← Back to team overview

launchpad-dev team mailing list archive

Re: Collation orders in Launchpad

 

On 2011-06-03 13:17, Martin Pool wrote:

Maybe as a simple place to start there should just be a "compare human
strings" function that can be passed to sort(cmp=) and at least the
.lower() will not be repeated.

This sounds very attractive to me. We'd have to make "this sort is for display purposes" more or less explicit in the code, but I think that's better than sticking our heads in the sand.

Note by the way that .lower() puts "A" next to "a" but in no specified order, so it's still not great. (And upper/lower case conversion can behave weirdly with some non-ASCII characters, but I probably don't know the half of it). In general, unicode strings are best left intact and passed to code that specializes in dealing with them.


Ideally the sort would be consistent with whatever psql does.  (Or is
it maybe case sensitive?)

PostgreSQL has configurable locales, so there's probably a range of options there. Perfect consistency is probably more trouble than it's worth, but if Unicode Snowman is sorted in the wrong place I suspect only Paul Hummer is going to notice.


Maybe we should actually use locale.strcoll, rather than comparing the
lowered forms?<http://docs.python.org/library/locale.html>   istr
this is rather better on non-English names.  For en_AU.UTF-8  it is
case insensitive, though it is case sensitive in C.

This sounds like the right sort of approach to me.


Jeroen


Follow ups

References