← Back to team overview

schooltool-developers team mailing list archive

On collation and stuff

 

Hi,

My two cents on a missed IRC discussion...

<replaceafill> i wonder if that collator.key(ref.title) should be collator.key(translate(ref.title, context=self.request))? <aelkner> well, the collator key() method shoudl translate the string perhaps?
<...>
<aelkner> so, should i make the translate() call just to be safe?
<replaceafill> i think you should

Yes, you should translate before collating. If you don't do that, it collates the original, *untranslated* string.

Collator generates key from a unicode string, it knows nothing about zope.i18nmessageid.Message. Our translatable strings can be transparently used as unstranslated unicode strings and, sadly, sometimes are. In zope.i18nmessageid you'll find "class Message(unicode):" (this is of course python implementation and in reality we use c implementation - but the point remains valid).

---8<------8<------8<------8<------8<------8<------8<------8<------8<---

<replaceafill> i mean, get the key for the translated version
<aelkner> do you think i need to do the translation before collating?
<replaceafill> you haven't test it yet with localized strings, correct?
<aelkner> right
<aelkner> i wouldn't know how either

Yeah, it's a bit easier for those of us in other locales :) We can always set our browser preferred locale to our language and see if "š comes after s".

By the way, this link might be handy: http://demo.icu-project.org/icu-bin/locexp?d_=en&x=col&_=root You can check how the text is supposed to be collated in selected locale there.

You use google translate to get some translated strings in кириллица for example (cyrillic alphabet, russian) and compare the results ;)

---8<------8<------8<------8<------8<------8<------8<------8<------8<---

<replaceafill> * wishes he understands "static PyObject *__pyx_f_10_zope_ucol_8Collator_key(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds)" :|

It's written in Pyrex (http://en.wikipedia.org/wiki/Pyrex_%28programming_language%29). Since omlette does not bake zope.ucol for some reason, you can look at ~/.buildout/eggs/zope.ucol-1.0.2-****/zope/ucol/ directly (or wherever your eggs are stored). _zope_ucol.pyx source is quite readable, and the generated _zope_ucol.c is - of course - a mess.

---8<------8<------8<------8<------8<------8<------8<------8<------8<---

And this code caught my eye (I know its in deep dev, consider this a pre-review ;) ):

key_order = ['student', 'group', 'section', 'schoolyear', 'term']
collator = ICollator(self.request.locale)
rows = []
for ref, name in getAdapters(...):
row = {
'category': ref.category,
'title': ref.title,
'description': ref.description,
'url': ref.url,
}
for index, key in enumerate(key_order):
if key == ref.category_key:
break
else:
index = len(key_order)
translated_title = translate(ref.title, context=self.request)
rows.append([index, collator.key(translated_title), row])
return [row for index, title, row in sorted(rows)]

I would rewrite it to:

collator = ICollator(self.request.locale)
rows = [ref for ref, name in getAdapters(...)]
category_order = ['student', 'group', 'section', 'schoolyear', 'term']

def sortKey(row):
if row.category_key in category_order:
index = category_order.index(row.category_key)
else:
# XXX: piling the rest of the categories at the bottom, unsorted
index = len(category_order)
return (index, collator.key(translate(row.title), context=self.request))

return sorted(rows, key=sortKey)

Cheers,
Justas

P.S.: While zope.ucol is a collation wrapper around a C library that is updated constantly, zope.i18n is not. It is a pure python implementation on CLDR data and currently uses CLDR 1.1 if I'm not mistaken (http://pypi.python.org/pypi/zope.i18n/).

*Akhem*! - it's a bit outdated (http://cldr.unicode.org/index/downloads). As in - generated back in 2004. Latest CLDR 1.9 was released a month ago.

P.P.S: as if collation wasn't "easy" enough, there are several sort orders: http://demo.icu-project.org/icu-bin/locexp?_=es_SV&d_=en&x=chS&ox= (notice "traditional", "phonebook", "dictionary", ...)