← Back to team overview

launchpad-dev team mailing list archive

Re: Collation orders in Launchpad

 

On 2011-06-03 13:17, Martin Pool wrote:

Maybe as a simple place to start there should just be a "compare human
strings" function that can be passed to sort(cmp=) and at least the
.lower() will not be repeated.
This sounds very attractive to me.  We'd have to make "this sort is for 
display purposes" more or less explicit in the code, but I think that's 
better than sticking our heads in the sand.
Note by the way that .lower() puts "A" next to "a" but in no specified 
order, so it's still not great.  (And upper/lower case conversion can 
behave weirdly with some non-ASCII characters, but I probably don't know 
the half of it).  In general, unicode strings are best left intact and 
passed to code that specializes in dealing with them.

Ideally the sort would be consistent with whatever psql does.  (Or is
it maybe case sensitive?)
PostgreSQL has configurable locales, so there's probably a range of 
options there.  Perfect consistency is probably more trouble than it's 
worth, but if Unicode Snowman is sorted in the wrong place I suspect 
only Paul Hummer is going to notice.

Maybe we should actually use locale.strcoll, rather than comparing the
lowered forms?<http://docs.python.org/library/locale.html>   istr
this is rather better on non-English names.  For en_AU.UTF-8  it is
case insensitive, though it is case sensitive in C.
This sounds like the right sort of approach to me.


Jeroen


Follow ups

References