schooltool-developers team mailing list archive
-
schooltool-developers team
-
Mailing list archive
-
Message #00315
On collation and stuff
Hi,
My two cents on a missed IRC discussion...
<replaceafill> i wonder if that collator.key(ref.title) should be
collator.key(translate(ref.title, context=self.request))?
<aelkner> well, the collator key() method shoudl translate the string
perhaps?
<...>
<aelkner> so, should i make the translate() call just to be safe?
<replaceafill> i think you should
Yes, you should translate before collating. If you don't do that, it
collates the original, *untranslated* string.
Collator generates key from a unicode string, it knows nothing about
zope.i18nmessageid.Message. Our translatable strings can be
transparently used as unstranslated unicode strings and, sadly,
sometimes are. In zope.i18nmessageid you'll find "class
Message(unicode):" (this is of course python implementation and in
reality we use c implementation - but the point remains valid).
---8<------8<------8<------8<------8<------8<------8<------8<------8<---
<replaceafill> i mean, get the key for the translated version
<aelkner> do you think i need to do the translation before collating?
<replaceafill> you haven't test it yet with localized strings, correct?
<aelkner> right
<aelkner> i wouldn't know how either
Yeah, it's a bit easier for those of us in other locales :) We can
always set our browser preferred locale to our language and see if "š
comes after s".
By the way, this link might be handy:
http://demo.icu-project.org/icu-bin/locexp?d_=en&x=col&_=root
You can check how the text is supposed to be collated in selected locale
there.
You use google translate to get some translated strings in кириллица for
example (cyrillic alphabet, russian) and compare the results ;)
---8<------8<------8<------8<------8<------8<------8<------8<------8<---
<replaceafill> * wishes he understands "static PyObject
*__pyx_f_10_zope_ucol_8Collator_key(PyObject *__pyx_v_self, PyObject
*__pyx_args, PyObject *__pyx_kwds)" :|
It's written in Pyrex
(http://en.wikipedia.org/wiki/Pyrex_%28programming_language%29). Since
omlette does not bake zope.ucol for some reason, you can look at
~/.buildout/eggs/zope.ucol-1.0.2-****/zope/ucol/ directly (or wherever
your eggs are stored). _zope_ucol.pyx source is quite readable, and the
generated _zope_ucol.c is - of course - a mess.
---8<------8<------8<------8<------8<------8<------8<------8<------8<---
And this code caught my eye (I know its in deep dev, consider this a
pre-review ;) ):
key_order = ['student', 'group', 'section', 'schoolyear', 'term']
collator = ICollator(self.request.locale)
rows = []
for ref, name in getAdapters(...):
row = {
'category': ref.category,
'title': ref.title,
'description': ref.description,
'url': ref.url,
}
for index, key in enumerate(key_order):
if key == ref.category_key:
break
else:
index = len(key_order)
translated_title = translate(ref.title, context=self.request)
rows.append([index, collator.key(translated_title), row])
return [row for index, title, row in sorted(rows)]
I would rewrite it to:
collator = ICollator(self.request.locale)
rows = [ref for ref, name in getAdapters(...)]
category_order = ['student', 'group', 'section', 'schoolyear', 'term']
def sortKey(row):
if row.category_key in category_order:
index = category_order.index(row.category_key)
else:
# XXX: piling the rest of the categories at the bottom, unsorted
index = len(category_order)
return (index, collator.key(translate(row.title), context=self.request))
return sorted(rows, key=sortKey)
Cheers,
Justas
P.S.: While zope.ucol is a collation wrapper around a C library that is
updated constantly, zope.i18n is not. It is a pure python implementation
on CLDR data and currently uses CLDR 1.1 if I'm not mistaken
(http://pypi.python.org/pypi/zope.i18n/).
*Akhem*! - it's a bit outdated
(http://cldr.unicode.org/index/downloads). As in - generated back in
2004. Latest CLDR 1.9 was released a month ago.
P.P.S: as if collation wasn't "easy" enough, there are several sort
orders:
http://demo.icu-project.org/icu-bin/locexp?_=es_SV&d_=en&x=chS&ox=
(notice "traditional", "phonebook", "dictionary", ...)