schooltool-developers team mailing list archive

Thread
Date

On collation and stuff

To: SchoolTool Developers <schooltool-developers@xxxxxxxxxxxxxxxxxxx>
From: Justas <justas@xxxxxx>
Date: Thu, 03 Feb 2011 12:18:38 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7

Hi,

My two cents on a missed IRC discussion...

<replaceafill> i wonder if that collator.key(ref.title) should becollator.key(translate(ref.title, context=self.request))?<aelkner> well, the collator key() method shoudl translate the stringperhaps?

<...>
<aelkner> so, should i make the translate() call just to be safe?
<replaceafill> i think you should

Yes, you should translate before collating. If you don't do that, itcollates the original, *untranslated* string.

Collator generates key from a unicode string, it knows nothing aboutzope.i18nmessageid.Message. Our translatable strings can betransparently used as unstranslated unicode strings and, sadly,sometimes are. In zope.i18nmessageid you'll find "classMessage(unicode):" (this is of course python implementation and inreality we use c implementation - but the point remains valid).


---8<------8<------8<------8<------8<------8<------8<------8<------8<---

<replaceafill> i mean, get the key for the translated version
<aelkner> do you think i need to do the translation before collating?
<replaceafill> you haven't test it yet with localized strings, correct?
<aelkner> right
<aelkner> i wouldn't know how either

Yeah, it's a bit easier for those of us in other locales :) We canalways set our browser preferred locale to our language and see if "šcomes after s".

By the way, this link might be handy:http://demo.icu-project.org/icu-bin/locexp?d_=en&x=col&_=rootYou can check how the text is supposed to be collated in selected localethere.

You use google translate to get some translated strings in кириллица forexample (cyrillic alphabet, russian) and compare the results ;)


---8<------8<------8<------8<------8<------8<------8<------8<------8<---

<replaceafill> * wishes he understands "static PyObject*__pyx_f_10_zope_ucol_8Collator_key(PyObject *__pyx_v_self, PyObject*__pyx_args, PyObject *__pyx_kwds)" :|

It's written in Pyrex(http://en.wikipedia.org/wiki/Pyrex_%28programming_language%29). Sinceomlette does not bake zope.ucol for some reason, you can look at~/.buildout/eggs/zope.ucol-1.0.2-****/zope/ucol/ directly (or whereveryour eggs are stored). _zope_ucol.pyx source is quite readable, and thegenerated _zope_ucol.c is - of course - a mess.


---8<------8<------8<------8<------8<------8<------8<------8<------8<---

And this code caught my eye (I know its in deep dev, consider this apre-review ;) ):


key_order = ['student', 'group', 'section', 'schoolyear', 'term']
collator = ICollator(self.request.locale)
rows = []
for ref, name in getAdapters(...):
row = {
'category': ref.category,
'title': ref.title,
'description': ref.description,
'url': ref.url,
}
for index, key in enumerate(key_order):
if key == ref.category_key:
break
else:
index = len(key_order)
translated_title = translate(ref.title, context=self.request)
rows.append([index, collator.key(translated_title), row])
return [row for index, title, row in sorted(rows)]

I would rewrite it to:

collator = ICollator(self.request.locale)
rows = [ref for ref, name in getAdapters(...)]
category_order = ['student', 'group', 'section', 'schoolyear', 'term']

def sortKey(row):
if row.category_key in category_order:
index = category_order.index(row.category_key)
else:
# XXX: piling the rest of the categories at the bottom, unsorted
index = len(category_order)
return (index, collator.key(translate(row.title), context=self.request))

return sorted(rows, key=sortKey)

Cheers,
Justas

P.S.: While zope.ucol is a collation wrapper around a C library that isupdated constantly, zope.i18n is not. It is a pure python implementationon CLDR data and currently uses CLDR 1.1 if I'm not mistaken(http://pypi.python.org/pypi/zope.i18n/).

*Akhem*! - it's a bit outdated(http://cldr.unicode.org/index/downloads). As in - generated back in2004. Latest CLDR 1.9 was released a month ago.

P.P.S: as if collation wasn't "easy" enough, there are several sortorders:http://demo.icu-project.org/icu-bin/locexp?_=es_SV&d_=en&x=chS&ox=(notice "traditional", "phonebook", "dictionary", ...)