launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07257
Re: Collation orders in Launchpad
On 2011-06-03 13:17, Martin Pool wrote:
Maybe as a simple place to start there should just be a "compare human
strings" function that can be passed to sort(cmp=) and at least the
.lower() will not be repeated.
This sounds very attractive to me. We'd have to make "this sort is for
display purposes" more or less explicit in the code, but I think that's
better than sticking our heads in the sand.
Note by the way that .lower() puts "A" next to "a" but in no specified
order, so it's still not great. (And upper/lower case conversion can
behave weirdly with some non-ASCII characters, but I probably don't know
the half of it). In general, unicode strings are best left intact and
passed to code that specializes in dealing with them.
Ideally the sort would be consistent with whatever psql does. (Or is
it maybe case sensitive?)
PostgreSQL has configurable locales, so there's probably a range of
options there. Perfect consistency is probably more trouble than it's
worth, but if Unicode Snowman is sorted in the wrong place I suspect
only Paul Hummer is going to notice.
Maybe we should actually use locale.strcoll, rather than comparing the
lowered forms?<http://docs.python.org/library/locale.html> istr
this is rather better on non-English names. For en_AU.UTF-8 it is
case insensitive, though it is case sensitive in C.
This sounds like the right sort of approach to me.
Jeroen
Follow ups
References