Launchpad logo and name.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index ][Thread Index ]

Re: [Launchpad-users] Translations branch keeps having status set to 'Merged'



On Thu, 30 Jul 2009, Jeroen Vermeulen wrote:

Michael B. Trausch wrote:

Hrm.  Interesting.  I own the team (AllTray Developers) and the team
owns the project (and the branch), and I've got (at least insofar as I
am aware) full rights to the project and the trees.

I need to take a look and see if there is anything amiss in my project
setup, if what you stated above was correct.  Perhaps this means that
there is a bug in Launchpad, also, but it is perhaps more likely that
I plain messed something up or didn't do something right.  I know that
in the drop-down list that is offered, I can't select it, and when I
enter it manually, it tells me that there is an error.

I'm not entirely certain that owning a team also makes you a member... Can you also join the team as if you weren't also the owner? It sounds like a bug in the ownership check, yes.

I am a member of the team. I don't remember if that was automatic, or if I had to explicitly add myself, but either way, I am:

  https://edge.launchpad.net/~alltray-developers/+members

My guess is that this is done for the reason suggested in the Info
documentation for gettext:  POSIX does not specify the character set
of the C locale, which makes it implementation dependent.  That
more-or-less implictly makes it only safe to assume that the C locale
has the 7-bit original ASCII character set.  One example of using the
C locale actually comes to mind:  taking log output and dumping it to
a relatively old dot-matrix printer.  Probably not a terribly common
use-case, and for many of the things I spit out to mine, I run through
iconv and ask it to convert and transliterate as much as possible into
my printer's character set (ISO-8859-1).  If I send UTF-8 output to
the printer, it freaks out (if there are actually any UTF-8 characters
that are non-ASCII, anyway).

In this example, the right thing to do is probably not to log non-ASCII data, unless the log file has a well-defined encoding. So that's perhaps more of a data i18n issue than a software translation issue.

Insofar as the messages come from the software itself, it sounds like one of those cases where you shouldn't want to translate the original messages anyway, or have non-ASCII characters in them.

Well, for example, in AllTray's case, additional log output is enabled by running AllTray like so:

 $ ALLTRAY_DEBUG=ALL alltray -D <other parameters>

The output from that is already translated and will presumably be UTF-8 for all non-English locales, and in my case (using en_US.UTF-8) I wanted the output to also take advantage of Unicode. Why not? It looks better, and I am a perfectionist. ;-)

If someone is making a run of that for the purpose of filing a bug report, I'd very much like it if they ran it like this:

 $ LANG=en_US.UTF-8 ALLTRAY_DEBUG=ALL alltray -D <other params>

or like this:

 $ LANG=C ALLTRAY_DEBUG=ALL alltray -D <other params>

But that's not necessary; given the output from the debug messages, I can “untranslate” if necessary, but using the message databases to get the original output. In languages that have words that are simliar, I can do better than that and kind of read the messages, even, so it's still useful from a debugging point of view. The intent really is to make troubleshooting and identifying problems something that doesn't have to hit a language barrier. I don't want my unilingual background to be an imposition on someone that would like to track down an issue, at least to whatever extent that is possible. The source code is, and always will be, in English, essentially out of necessity.

Now, as an aside, I had thought about this once before.  We very much
so live in a Unicode world, and I've been quite happy to say good-bye
to the days of asking the question “which text encoding is this in?”
because now there is one character set (with multiple representations,
sure, but autodetectable ones, for the most part) and most often it's
used using a character encoding that is backwards-compatible with
ASCII, at least for displaying ASCII text as much as possible.  But,
and I am sure that I'm not alone, though I may be in the minority, I
still have devices and the like which don't talk Unicode at all.  Most
of those use ISO-8859-1, though I have one device (a _very_ old
printer) that only takes ASCII input; everything else is a control
code or otherwise gibberish when output there.  So, “translating” into
en_US.UTF-8 would be desirable.  If this were five years ago, I'd say
that being able to translate to even more specific character sets
would be important, but I don't think that it is so much.

I agree that we can't just pretend that every environment will support Unicode. But how important is it for these specific programs to support non-ASCII characters in the English messages when run in environments that support them? Is it really worth the extra effort, or would pure-ASCII messages be acceptable for these programs?

At least to me, quite important. AllTray has the potential for working in essentially any environment that GNOME can run in. I'm not sure if the C locale on all of those platforms is Unicode or not. In Linux, it doesn't seem to much matter, because it will output anyway, though you do have to have the terminal already in UTF-8 mode for the output to make sense, I'd wager.

According to the GNU C library documentation, the C and POSIX locales are equivalent, and its implementation bases the C locale on ANSI C. As I don't have the ANSI C specification, I don't know what that means exactly, but I can wager a guess that at the very least it doesn't mean UTF-8.

FreeBSD seems to assume that the C locale is in the US-ASCII character set. NetBSD explicitly states that “The C or POSIX locale assumes the 7-bit ASCII character set …”, as does NetBSD, it seems. So, not only is it safe to assume that C/POSIX locales are 7-bit ASCII, it seems that it is correct to state that as a fact. (Ironically, I think that if we were having this discussion closer to the time that UTF-8 was conceived of, there would be no argument here.)

So, it seems that:

 (a) When the C locale is being used, only 7-bit ASCII should be output,
(b) The program's source code (for a C or Vala program which translates to C) should be written in the C locale, (c) Translations should be considered not only a tool for use in translating the program from one language to another, but from one character set to another.

The relative importance of translating from one character set to another is diminishing with time, and will probably only completely go away when legacy encodings are no more. It would be nice to see the “C” locale be explicitly defined by POSIX as being “the Unicode character set, represented using the UTF-8 encoding”. I don't know that we'll ever see that happen, though.

Now, in terms of determining importance, that is of course going to vary with each project, each developer. Being that Launchpad is a project hosting platform, I'd make the argument that all reasonable choices should be permissible. Maybe for someone like myself, who wants to strictly adhere to the standards (or lack thereof, in some cases), there could be an option to enable “translating” into en_US(.UTF-8). For other projects that have legacy translations (which are, I think, more likely going to be software that runs in a terminal as opposed to software that runs in a GUI environment—it seems that at least GTK+ works with UTF-8 internally, I don't know about Qt or other graphical systems), maybe an option to enable translating into any character set supported by iconv would be possible.

Or, for that matter, translations on Launchpad could always be in UTF-8, but the project administrator could enable options to have Launchpad trans(late|literate) UTF-8 translations for languages into other character sets that those translations make sense in. iconv could then read the UTF-8, and output the target character set, and in the process transliterate any characters that don't exist in the target character set (this works very well for me, transliterating UTF-8 into ISO-8859-1, for example, for some of my legacy devices). Like the remainder of the thread, though, it's just an idea. :-)

We have something else in the pipeline that may fulfill that need
though.  At some point we're planning to allow translators to upload
files to translations they don't have review rights for.  The
translations in the files would go into the system as suggestions.

The only thing (at least in my own use-case) that would ever be
committed would be an updated .pot file, with the possibility of the
.po files having new (yet-to-be translated) strings put in them.
Whatever it is that the automake magic does when you regenerate the
.pot file, anyway.

That's a different matter: you can tell Launchpad to import just the templates from your branch. The export option will only export the translations, so you'd have templates going one way and translations the other. When Launchpad imports a new template, it automatically updates the translations: messages that are no longer in the template are gone, and new ones are added as untranslated entries.

Yes, but then wouldn't I lose the benefit of knowing what's where? The interface that Translations presents, telling me what translations are synced and what ones are not translated, changed in LP, or whatever, is a killer feature. I wouldn't want to disable that by stopping the import of translations files.

I'm "okay" with deleting the files and re-adding them when needed
(which is basically what I did today to avoid any conflicts, since it
just wouldn't merge).  But the mode of operation where basically
Launchpad manages the translation files works well for me, at least in
theory.  That way, I can always have the up-to-date translations
already available and integrated, and focus on adding and changing
strings in the code for the translators.  ;-)

True, the diffcult merges are likely to be a teething problem that mostly go away once your original translation files are synchronized with what Launchpad produces.

Yes, well, hopefully that means that I can be lazy now.  :-P

Seriously, though, I much prefer automation if for no other reason than it robs me of the chance to muck things up.

Are "stacked" branches stacked virtually?  Like, here's the big
project, and here's a subdirectory which is not the project proper,
but a sibling project or something (say, translations) and so branch
(A) is the project, and branch (B) is the translations only, with (A)
holding all of the other project?  That is, when (A) is updated, and
(B) is stacked on (A), does (B) see the new changes?  Or is (B) a
branch that will have it's own history and have to have (A) merged
into it to be kept current?

The former.

Hrm. Interesting. So, they are stacked in the sense that history is shared and if the history of the stacked-on branch is updated, so is the history of the stacked branch itself? That's nifty.

There may be tricks to make LP do this for you.  I don't know if
there's anything stopping you from mirroring a Launchpad branch on
Launchpad. Of course you can't have the translations export to a
mirrored branch, but it may enable some other approach.

Hrm.  An interesting point... I will explore that road tomorrow and
see what it holds.

Do let us know!  Your feedback so far has been very helpful.

I'm still not getting anywhere with this.  :-/

That's alright, though. I'm going to wager that what I really want is probably an extremely outlandish use-case, and make do. :) I think I have a system that will work, though I will only know that with time...

	--- Mike


This is the launchpad-users mailing list archive — see also the general help for Launchpad.net mailing lists.

(Formatted by MHonArc.)