ubuntu-translations-coordinators team mailing list archive

Thread
Date

Re: Translations cleanup

To: Christian PERRIER <christian@xxxxxxxxxxxxxx>
From: Arne Goetje <arne.goetje@xxxxxxxxxxxxx>
Date: Wed, 09 Jun 2010 16:02:27 +0800
Cc: Ubuntu Translations Coordinators <ubuntu-translations-coordinators@xxxxxxxxxxxxxxxxxxx>, debian-l10n-devel@xxxxxxxxxxxxxxxxxxxxxxx
In-reply-to: <20100607175642.GF3256@mykerinos.kheops.frmug.org>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4 ThunderBrowse/3.2.8.1

Hi Christian,

CCʼing Ubuntu Translation Coordinators team, since Ubuntu is downstreamof Debian and therefore it should be also in Ubuntuʼs interest that thisstuff gets fixed.


On 06/08/2010 01:56 AM, Christian PERRIER wrote:

Quoting Arne Goetje (arne.goetje@xxxxxxxxxxxxx):

Hi Christian,


CC'ing debian-l10n-devel mailing list, that gathers all people
involved in the Debian i18n infrastructure and scripts.


I just stumbled upon this translation overview page:
http://www.debian.org/international/l10n/po/

The script which generates the page seems to need some improvement:
  * it seems not to use the iso_639_3.xml file from the iso-codes
package, since many language codes are marked as "Unknown language".



I'm not sure this is easy to achieve. Nicolas, any idea?

  * language codes with @ modifiers are not parsed correctly. The
script should split the string at the @ and display it like this:
ca@valencia Catalan (valencia).


That can probably be fixed even though my personal opinion about this
ca@valencia "joke" is....let's say this politically correct...mitigated..:-)

Well, yes. ca@valencia is for political reasons. But other variations(@latin, @devanagari) do make sense.

  * some entries look bogus, e.g. vi_AR. There are no translations
with that code, so it needs to be investigated where this code comes
from.


Certainly from some bogus package providing a vi_AR.po file.

Well, that bogus .po file seems to be completely empty then, sincefollowing such links leads to 0 translatable strings and no .po fileslisted. Probably we would need to add some debugging code in the scriptto tell us which package carries such crap.

Also, I donʼt see any reason to have CSB, KAB and TLH (upper case),which lead to csb, kab and tlh (lower case) respectively.

Also, I'd like to ask if there is any coordinated effort planned or
underway to fix the .po file names in the packages themselves? Quite
a few files need to be renamed in order to be useable.


There have been some initiatives. In a quite distant past, I reported
a few such errors to the relevant packages.

What would be the best approach to address such bugs? Can we tag bugreports, so that we can easily filter them for this task?Should we prepare debdiffs or patches to fix these issues properly andattach them to the bug reports?

In Ubuntu we have established some guidelines [1] for developers to namethose .po and .pot files properly, so that we can parse them easily whenimporting them into Launchpad. (We use Launchpad [2] to allow ourtranslators to translate the packages, as you might know.)

[1]https://wiki.ubuntu.com/UbuntuDevelopment/Internationalisation/RecipeVerifyingTranslationUploads

[2] http://translations.launchpad.net/ubuntu/lucid


Examples:
  * dk ->  should be da, according to the translations inside
  * sr_SR ->  the country code for Serbia is RS. It should actually be
just 'sr'. Likewise with sr_YU.
  * sr@Latn and sr@latin is actually the same and should be merged
into sr@latin. sr@Latn doesn't exist as a locale.
  * no and no_NO are discouraged. Translations should be either nb or
nn. In most cases, these 'no' translations are actually nb.
  * zh is also discouraged, they should be either zh_CN or zh_TW.
  * codes with country codes, where the language is only mainly
spoken in one country should be merged with the country-less
language codes to avoid confusion. E.g. ca_ES@valencia should get
merged into ca@valencia



I even go further: fr_FR.po when there is no fr.po file and no other
fr_* file is plain stupid. Indeed, my own personal opinion is that
there is no serious argument for using country modifiers for most of
the "multiple country" languages.


Thatʼs what I meant, yes. :)

I had this debate many times in Debian lists...and, of course, there
always someone popping up in a more or less pedantic way and "kindly"
explaining me that "French as spoken in Belgium" is different from
"French as spoken in France", but:

- after over 10 years in l10n, I know about all this and probably all
specificities of most languages in the world. That's pedantic too but
I think I deserve the right to be pedantic on that matter

- software l10n is about *written* languages, not spoken ones and
apart from  very specific very well known cases ("ordenador" vs
"computador" in es_ES and es_everywhere-else), there is no practical
differences in most cases

- only having fr_CA (for instance) translation files for French
deprives users of other French locales from the French translation
unless this file is copied as fr_FR, fr_CH, fr_BE, fr_LU, etc. Huge
waste of resources. Of course, French is only an example, here.

- exceptions to this (that is, real good reasons to use xx_YY.po files
are very limited:
   - pt vs pt_BR
   - zh_CN vs zh_TW (all all practical implications for users of zh_HK,
   zh_SG...)
   - eventually pa_IN/pa_PK and bn_BD/bn_IN

So, in short, all occurrences of xx_YY.po files (apart from the
abovementioned exceptions) should be hunted down....and I would
wholeheartedly welcome an initiative about this. Of course, most of
these errors belong to upstream software, but we can expect Debian
developers to relay them upstream (and of course, then, have fun times
arguing with upstream developers when they tell us we are wrong..:-))

Yes, Iʼm with you on that. And we should mobilize forces together fromDebian and Ubuntu to achieve this, since itʼs our common interest.For gettext applications, .po files with country codes should onlycontain diffs to the main country-code-less .po files. I.e. fr.po is themain translation file and fr_CA only contains diffs for those stringswhich really need to be written differently, either because of spelling,grammar or specific terms.How we deal with this on a case by case basis would need to bediscussed. E.g. we could expect the fr.po file to mainly include stringsfrom fr_FR, however if a software originates in Canada, translatorsmight have used fr_CA terms which might not be appropriate in France. Inthat case it might make sense to put the fr_CA strings into fr.po anduse a fr_FR diff for those strings which would be used different inFrance. Or, we modify those strings, so that fr.po is always equal tofr_FR and put the diffs into fr_CA.This would need to be discussed and we should establish a common policyfor this.


One more thing Iʼd like to mention here:

As mentioned above already, Ubuntu translators and translation teams aretranslating and bug fixing translations in Launchpad. Those translationsthen get exported into language-packs for Ubuntu. It is our wish thatthese changes get also contributed back to upstream, but we see thatthis does not work very well, except for a few cases.For the packages which originate in Debian (at least), Iʼd like to kickoff a discussion about how we could cooperate better to get translationimprovements back into Debian. This includes for exampledebian-installer and iso-codes, for which the translations made inLaunchpad are not used currently, since our policy is that we donʼt wantto diverge further form upstream for these packages, just because oftranslations.So ideally, we would establish a channel to get those translations backinto Debian. Either Debian translation teams could "harvest" thetranslations from Launchpad (they can be downloaded individually or inbatches), or the Ubuntu translation teams somehow push them back toDebian one way or another.


What do you think?

Cheers
Arne

Follow ups

Re: [Debian-l10n-devel] Translations cleanup
From: Nicolas François, 2010-06-09
Re: [Debian-l10n-devel] Translations cleanup
From: Martin Bagge / brother, 2010-06-09