ubuntu-translations-coordinators team mailing list archive
-
ubuntu-translations-coordinators team
-
Mailing list archive
-
Message #00409
Re: Translations cleanup
Hi Christian,
CCʼing Ubuntu Translation Coordinators team, since Ubuntu is downstream
of Debian and therefore it should be also in Ubuntuʼs interest that this
stuff gets fixed.
On 06/08/2010 01:56 AM, Christian PERRIER wrote:
Quoting Arne Goetje (arne.goetje@xxxxxxxxxxxxx):
Hi Christian,
CC'ing debian-l10n-devel mailing list, that gathers all people
involved in the Debian i18n infrastructure and scripts.
I just stumbled upon this translation overview page:
http://www.debian.org/international/l10n/po/
The script which generates the page seems to need some improvement:
* it seems not to use the iso_639_3.xml file from the iso-codes
package, since many language codes are marked as "Unknown language".
I'm not sure this is easy to achieve. Nicolas, any idea?
* language codes with @ modifiers are not parsed correctly. The
script should split the string at the @ and display it like this:
ca@valencia Catalan (valencia).
That can probably be fixed even though my personal opinion about this
ca@valencia "joke" is....let's say this politically correct...mitigated..:-)
Well, yes. ca@valencia is for political reasons. But other variations
(@latin, @devanagari) do make sense.
* some entries look bogus, e.g. vi_AR. There are no translations
with that code, so it needs to be investigated where this code comes
from.
Certainly from some bogus package providing a vi_AR.po file.
Well, that bogus .po file seems to be completely empty then, since
following such links leads to 0 translatable strings and no .po files
listed. Probably we would need to add some debugging code in the script
to tell us which package carries such crap.
Also, I donʼt see any reason to have CSB, KAB and TLH (upper case),
which lead to csb, kab and tlh (lower case) respectively.
Also, I'd like to ask if there is any coordinated effort planned or
underway to fix the .po file names in the packages themselves? Quite
a few files need to be renamed in order to be useable.
There have been some initiatives. In a quite distant past, I reported
a few such errors to the relevant packages.
What would be the best approach to address such bugs? Can we tag bug
reports, so that we can easily filter them for this task?
Should we prepare debdiffs or patches to fix these issues properly and
attach them to the bug reports?
In Ubuntu we have established some guidelines [1] for developers to name
those .po and .pot files properly, so that we can parse them easily when
importing them into Launchpad. (We use Launchpad [2] to allow our
translators to translate the packages, as you might know.)
[1]
https://wiki.ubuntu.com/UbuntuDevelopment/Internationalisation/RecipeVerifyingTranslationUploads
[2] http://translations.launchpad.net/ubuntu/lucid
Examples:
* dk -> should be da, according to the translations inside
* sr_SR -> the country code for Serbia is RS. It should actually be
just 'sr'. Likewise with sr_YU.
* sr@Latn and sr@latin is actually the same and should be merged
into sr@latin. sr@Latn doesn't exist as a locale.
* no and no_NO are discouraged. Translations should be either nb or
nn. In most cases, these 'no' translations are actually nb.
* zh is also discouraged, they should be either zh_CN or zh_TW.
* codes with country codes, where the language is only mainly
spoken in one country should be merged with the country-less
language codes to avoid confusion. E.g. ca_ES@valencia should get
merged into ca@valencia
I even go further: fr_FR.po when there is no fr.po file and no other
fr_* file is plain stupid. Indeed, my own personal opinion is that
there is no serious argument for using country modifiers for most of
the "multiple country" languages.
Thatʼs what I meant, yes. :)
I had this debate many times in Debian lists...and, of course, there
always someone popping up in a more or less pedantic way and "kindly"
explaining me that "French as spoken in Belgium" is different from
"French as spoken in France", but:
- after over 10 years in l10n, I know about all this and probably all
specificities of most languages in the world. That's pedantic too but
I think I deserve the right to be pedantic on that matter
- software l10n is about *written* languages, not spoken ones and
apart from very specific very well known cases ("ordenador" vs
"computador" in es_ES and es_everywhere-else), there is no practical
differences in most cases
- only having fr_CA (for instance) translation files for French
deprives users of other French locales from the French translation
unless this file is copied as fr_FR, fr_CH, fr_BE, fr_LU, etc. Huge
waste of resources. Of course, French is only an example, here.
- exceptions to this (that is, real good reasons to use xx_YY.po files
are very limited:
- pt vs pt_BR
- zh_CN vs zh_TW (all all practical implications for users of zh_HK,
zh_SG...)
- eventually pa_IN/pa_PK and bn_BD/bn_IN
So, in short, all occurrences of xx_YY.po files (apart from the
abovementioned exceptions) should be hunted down....and I would
wholeheartedly welcome an initiative about this. Of course, most of
these errors belong to upstream software, but we can expect Debian
developers to relay them upstream (and of course, then, have fun times
arguing with upstream developers when they tell us we are wrong..:-))
Yes, Iʼm with you on that. And we should mobilize forces together from
Debian and Ubuntu to achieve this, since itʼs our common interest.
For gettext applications, .po files with country codes should only
contain diffs to the main country-code-less .po files. I.e. fr.po is the
main translation file and fr_CA only contains diffs for those strings
which really need to be written differently, either because of spelling,
grammar or specific terms.
How we deal with this on a case by case basis would need to be
discussed. E.g. we could expect the fr.po file to mainly include strings
from fr_FR, however if a software originates in Canada, translators
might have used fr_CA terms which might not be appropriate in France. In
that case it might make sense to put the fr_CA strings into fr.po and
use a fr_FR diff for those strings which would be used different in
France. Or, we modify those strings, so that fr.po is always equal to
fr_FR and put the diffs into fr_CA.
This would need to be discussed and we should establish a common policy
for this.
One more thing Iʼd like to mention here:
As mentioned above already, Ubuntu translators and translation teams are
translating and bug fixing translations in Launchpad. Those translations
then get exported into language-packs for Ubuntu. It is our wish that
these changes get also contributed back to upstream, but we see that
this does not work very well, except for a few cases.
For the packages which originate in Debian (at least), Iʼd like to kick
off a discussion about how we could cooperate better to get translation
improvements back into Debian. This includes for example
debian-installer and iso-codes, for which the translations made in
Launchpad are not used currently, since our policy is that we donʼt want
to diverge further form upstream for these packages, just because of
translations.
So ideally, we would establish a channel to get those translations back
into Debian. Either Debian translation teams could "harvest" the
translations from Launchpad (they can be downloaded individually or in
batches), or the Ubuntu translation teams somehow push them back to
Debian one way or another.
What do you think?
Cheers
Arne
Follow ups