Launchpad logo and name.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index ][Thread Index ]

RE: Call for testing new Launchpad Translations code performance



Carlos Perelló Marín [mailto:carlos.perello@xxxxxxxxxxxxx]
> El mié, 21-11-2007 a las 15:22 +0100, Philippe Verdy escribió:
> > Could I request a missing item for translations: currently, there's
> > absolutely NO ways of entering non-breaking spaces in translations,
> despite
> > some translations require them.
> 
> That's not completely true, we should expose that feature more but we do
> allow you to introduce non-breaking spaces:
> 
> https://bugs.launchpad.net/rosetta/+bug/81281

No this occurs as well in IE7, not just Mozilla-based browsers. There's
apparently something wrong in the way the content is encoded/interpreted in
browsers, possibly because it is not correctly encoded to support
"alternate" spaces, or because of the encoding used to return the submitted
form.

I see only one way to solve it: don't send to browsers a pure text, send
escaped texts (using a trick like "\uNNNN" encoding). Then:
* if Javascript is not supported or disabled, the users will see the text
using these escapes, and will submit the data using a "safe" encoding that
is preserved.
* If Javascript is enabled, your Javascript can PRESENT the decoded text to
the user, and process back the input from the user so that the text is shown
in WYSIWYG mode. There should then be some checkbox allowing the text input
form to show the decoded (WYSIWIG) or encoded (\uNNNN) form.

Only the encoded form (with escapes) will be used to talk to the webserver.
Having the possibility to switch the content of an input form between the
two forms will help seeing the otherwise invisible characters that may cause
problems, or will help users entering these characters.

Note this:

Browsers don't let users enter for example a NBSP character, EVEN IF this
character is mapped on the keyboard, because this is generally mapped on
Ctrl+Space or Strl+Shift+Space and browsers are modifying the keymap and
disabling the Control key, or transforming the input as soon as it is
entered, even if this input comes from a copy/paste operation...).

Instead, they are assuming that the language entered will match with the
language used in the web page, and forcing the input to adopt the encoding
and character subsets used in the language determined from the web page
(they uses various tricks to do that, including not only the page encoding,
but also metadata sent in the HTTP headers, or some other elements in the
page, but generally they ignore the xml:lang or lang attribute set in the
web input elements, and this becomes even more complex with stylesheets in
actions).

The encodings interactins are really complex to handle over the HTTP
interface and with the interaction of HTML syntax. One way to prevent this
is to simplify the encoding at this interface, and then let some Javascript
make the work locally in the browser, to render the text back to normal
without the intermediate encoding used. Such loal Javascript will perform
the input decoding/encoding, validation and reformatting dynamically.

If Javascript is not enabled, users will still be able to interact with a
normal browser, but using only "safe" characters and an escaping syntax.
Note also that it is not clear what Launchpad is doing with translations
that contain text containing something that looks like literal HTML or
literal named character references, I've seen them changing after just
changing one character in the resource, despite it was not expected that
this would affect the encoding of the rest of the text. So when I download
back the translation results, I can see that they have been transformed
without any warning set to the user (no visible difference) when submitting
the data.

For this reason, I have reverted from using Launchpad: it cannot handle
international text properly and really breaks existing resources that were
working properly before and were already properly encoded. (For the projet
I'm interested in, the resources are to be converted into Java properties
files, and really contain Unicode text; Unicode being used as the central
encoding, even if it is then automatically converted into ASCII only using
Java-specific resource format for Unicode escapes in a ASCII only file):


very large files with thousands of resources that were completed and
reviewed by many persons since several years have suddenly been degraded to
become almost unusable, and everything needs to be rechecked manually (the
project counts more than 400,000 resources in various languages and scripts,
the whole set of texts occupying several megabytes if not compressed), and
the translation status was suddenly degraded so that many existing languages
were no longer usable and would have been removed from the distribution
(this included very common languages that had resources translated at nearly
100%, with just a few ones to maintain from time to time, and whose
translation level suddenly came to below 50% in the needed core resources).

Note also that in your site,

* Please don't let input box force their width so that they require
scrolling horizontally (even on a display with a large resolution), just
because the text to translate is a single paragraph (without any newline).
The text in that case is supposed to be displayed with automatic
line-wrapping, and your interface allows seeing the position where newlines
are effectively encoded in the resource text.

* the stylesheet is nearly unusable for proper text input: the text is
really TOO SMALL for entering anything else than just Basic Latin (i.e.
English and a few other languages, but most other languages use non ASCII
characters, and they are really hard to see and correct; for languages with
complex scripts or with subtle glyph distinctions, like Chinese, pointed
Arabic, Indian scripts, but also Korean using the regular Hangul alphabet,
it's almost impossible to read the text properly).

* the fonts specified are forced, but do not allow proper input of
international text. Please remove the font assignment at least in the input
box, or in the resource display (let the user specify its own visual font
from the browser settings, or at least make sure that the language being
worked on has its localized text styled using fonts that DO WORK with the
language):

Test a list of fonts working for each language/script pair, i.e. each one of
the supported locales, then mark the text displayed with a style specific to
that language, and then map a list of fonts for this language in a CSS
"font-family:", and make sure that the font are styled with a sufficient
point size !

And please let users grow the visual font size, both for the display of the
resources, and/or the input element.

My message is on topic, because this topic is speaking about a new Launchpad
Translations site, and it is testing some new features (primarily to speak
about the current performance problem, but this should also include the
problems of usability).







This is the launchpad-users mailing list archive — see also the general help for Launchpad.net mailing lists.

(Formatted by MHonArc.)