← Back to team overview

ubuntu-translations-coordinators team mailing list archive

Re: Translations cleanup

 

Hi Christian,

CCʼing Ubuntu Translation Coordinators team, since Ubuntu is downstream of Debian and therefore it should be also in Ubuntuʼs interest that this stuff gets fixed.

On 06/08/2010 01:56 AM, Christian PERRIER wrote:
Quoting Arne Goetje (arne.goetje@xxxxxxxxxxxxx):
Hi Christian,

CC'ing debian-l10n-devel mailing list, that gathers all people
involved in the Debian i18n infrastructure and scripts.


I just stumbled upon this translation overview page:
http://www.debian.org/international/l10n/po/

The script which generates the page seems to need some improvement:
  * it seems not to use the iso_639_3.xml file from the iso-codes
package, since many language codes are marked as "Unknown language".


I'm not sure this is easy to achieve. Nicolas, any idea?

  * language codes with @ modifiers are not parsed correctly. The
script should split the string at the @ and display it like this:
ca@valencia Catalan (valencia).

That can probably be fixed even though my personal opinion about this
ca@valencia "joke" is....let's say this politically correct...mitigated..:-)

Well, yes. ca@valencia is for political reasons. But other variations (@latin, @devanagari) do make sense.

  * some entries look bogus, e.g. vi_AR. There are no translations
with that code, so it needs to be investigated where this code comes
from.

Certainly from some bogus package providing a vi_AR.po file.

Well, that bogus .po file seems to be completely empty then, since following such links leads to 0 translatable strings and no .po files listed. Probably we would need to add some debugging code in the script to tell us which package carries such crap.

Also, I donʼt see any reason to have CSB, KAB and TLH (upper case), which lead to csb, kab and tlh (lower case) respectively.

Also, I'd like to ask if there is any coordinated effort planned or
underway to fix the .po file names in the packages themselves? Quite
a few files need to be renamed in order to be useable.

There have been some initiatives. In a quite distant past, I reported
a few such errors to the relevant packages.

What would be the best approach to address such bugs? Can we tag bug reports, so that we can easily filter them for this task? Should we prepare debdiffs or patches to fix these issues properly and attach them to the bug reports?

In Ubuntu we have established some guidelines [1] for developers to name those .po and .pot files properly, so that we can parse them easily when importing them into Launchpad. (We use Launchpad [2] to allow our translators to translate the packages, as you might know.)

[1] https://wiki.ubuntu.com/UbuntuDevelopment/Internationalisation/RecipeVerifyingTranslationUploads
[2] http://translations.launchpad.net/ubuntu/lucid


Examples:
  * dk ->  should be da, according to the translations inside
  * sr_SR ->  the country code for Serbia is RS. It should actually be
just 'sr'. Likewise with sr_YU.
  * sr@Latn and sr@latin is actually the same and should be merged
into sr@latin. sr@Latn doesn't exist as a locale.
  * no and no_NO are discouraged. Translations should be either nb or
nn. In most cases, these 'no' translations are actually nb.
  * zh is also discouraged, they should be either zh_CN or zh_TW.
  * codes with country codes, where the language is only mainly
spoken in one country should be merged with the country-less
language codes to avoid confusion. E.g. ca_ES@valencia should get
merged into ca@valencia


I even go further: fr_FR.po when there is no fr.po file and no other
fr_* file is plain stupid. Indeed, my own personal opinion is that
there is no serious argument for using country modifiers for most of
the "multiple country" languages.

Thatʼs what I meant, yes. :)

I had this debate many times in Debian lists...and, of course, there
always someone popping up in a more or less pedantic way and "kindly"
explaining me that "French as spoken in Belgium" is different from
"French as spoken in France", but:

- after over 10 years in l10n, I know about all this and probably all
specificities of most languages in the world. That's pedantic too but
I think I deserve the right to be pedantic on that matter

- software l10n is about *written* languages, not spoken ones and
apart from  very specific very well known cases ("ordenador" vs
"computador" in es_ES and es_everywhere-else), there is no practical
differences in most cases

- only having fr_CA (for instance) translation files for French
deprives users of other French locales from the French translation
unless this file is copied as fr_FR, fr_CH, fr_BE, fr_LU, etc. Huge
waste of resources. Of course, French is only an example, here.

- exceptions to this (that is, real good reasons to use xx_YY.po files
are very limited:
   - pt vs pt_BR
   - zh_CN vs zh_TW (all all practical implications for users of zh_HK,
   zh_SG...)
   - eventually pa_IN/pa_PK and bn_BD/bn_IN

So, in short, all occurrences of xx_YY.po files (apart from the
abovementioned exceptions) should be hunted down....and I would
wholeheartedly welcome an initiative about this. Of course, most of
these errors belong to upstream software, but we can expect Debian
developers to relay them upstream (and of course, then, have fun times
arguing with upstream developers when they tell us we are wrong..:-))

Yes, Iʼm with you on that. And we should mobilize forces together from Debian and Ubuntu to achieve this, since itʼs our common interest. For gettext applications, .po files with country codes should only contain diffs to the main country-code-less .po files. I.e. fr.po is the main translation file and fr_CA only contains diffs for those strings which really need to be written differently, either because of spelling, grammar or specific terms. How we deal with this on a case by case basis would need to be discussed. E.g. we could expect the fr.po file to mainly include strings from fr_FR, however if a software originates in Canada, translators might have used fr_CA terms which might not be appropriate in France. In that case it might make sense to put the fr_CA strings into fr.po and use a fr_FR diff for those strings which would be used different in France. Or, we modify those strings, so that fr.po is always equal to fr_FR and put the diffs into fr_CA. This would need to be discussed and we should establish a common policy for this.

One more thing Iʼd like to mention here:
As mentioned above already, Ubuntu translators and translation teams are translating and bug fixing translations in Launchpad. Those translations then get exported into language-packs for Ubuntu. It is our wish that these changes get also contributed back to upstream, but we see that this does not work very well, except for a few cases. For the packages which originate in Debian (at least), Iʼd like to kick off a discussion about how we could cooperate better to get translation improvements back into Debian. This includes for example debian-installer and iso-codes, for which the translations made in Launchpad are not used currently, since our policy is that we donʼt want to diverge further form upstream for these packages, just because of translations. So ideally, we would establish a channel to get those translations back into Debian. Either Debian translation teams could "harvest" the translations from Launchpad (they can be downloaded individually or in batches), or the Ubuntu translation teams somehow push them back to Debian one way or another.

What do you think?

Cheers
Arne



Follow ups