← Back to team overview

launchpad-dev team mailing list archive

State of Translations

 

This is the first (yes, I missed the previous quarter email) "state of
LP translations" email.  As such, it goes over state of the entire LP
Translations app.  Many bug references are missing: some are not even
filed as being obviously wrong and probably being a dozen bugs instead
of one, and others are just not linked because I decided not to spend
time on making this fully inter-referenced.

TOC:

Launchpad Translations
 * Core
   * Statistics
 * Web UI
   * POFile:+translate
   * POFile:+translate performance
   * Interface consistency
   * Set-up complexity
 * Search
   * POFile search performance
 * Message sharing
 * Translation imports
   * Import queue gardener
 * Translation exports
 * Language packs
 * Firefox support
 * Integration with code branches
   * Import
   * Export
   * Template generation
 * Bridging the gap
 * API
 * Tech-debt
 * DB state

Core
----

Launchpad Translations core has seen many refactoring efforts over time.
Core data model has been changed three times over the last 4 years, and
that has left serious baggage around.

Most notably, core translations-updating method ("updateTranslation")
has grown unmaintainable in that any change there breaks something else.
As part of ongoing efforts to extend translations sharing, this has been
cleaned up: this is only landed on our feature branch that we are
getting closer to landing.


Rest of the model code is not very logically structured, and there was
some effort to sanitize it (mainly efforts relating to
TranslatableMessage introduction).   That though never got anywhere due
to limited time.

Plans are to pick this up as time permits.


Statistics
..........

Launchpad keeps cached statistics for a number of translated messages,
new, unreviewed suggestions, or translations changed compared to
upstream per PO file.  They are updated after every change operation on
a PO file (either after import, or after a page submission via web UI).

This process doesn't support message sharing (in that it'd be too
expensive to do on-the-fly), so we rely on offline process to keep the
statistics truly in sync.

This process runs only weekly because it takes more than 24h to complete
for our entire data set.  As an optimization, we've implemented a daily
variant which only goes over PO files modified over the last set number
of days, but that one is disabled because of the problematic interaction
with DBLoopTuner (bug 622670).

However, beside the lack of support for message sharing, we seem to be
experiencing more issues: statistics are sometimes not updated properly.
Since it can sometimes be slow to update statistics for all related PO
files, we want to decouple this process from the web UI operations.


Web UI
------

Launchpad Translations web UI has been relatively stagnant for a while:
it does the job sufficiently well, though there are a few sore points.


POFile:+translate
.................

POFile:+translate page is a page translators spend most time on, and yet
it's over-crowded, slow, and very complex underneath.  Some work was
started to clean it up, but some things are simply brain dead (model,
browser and template code are all tightly coupled, and then reused for
slightly different views), and will need to be slowly cleaned up over
time.


Performance for POFile:+translate
.................................

POFile:+translate page emits a large number of queries and it scales
rapidly with the batch size (sometimes more than 1000).  The biggest
culprit are global translation suggestions which go through the entire
database and present slowest queries.  The next up is a search query
which occasionally times out on very large PO files.

The mere fact that it doesn't time out very often with this number of
queries is a testament to the time spent optimizing it, and to how much
more it can be improved.

Global suggestions have gone worse since our last release (10.10): we
are not yet sure if it's Postgres 8.4 upgrade or changes we landed
regarding TranslationMessage.pofile removal.  For the time being, global
suggestions are disabled on production servers.  We have landed a fix
that should fix the timeout issue and allow us to re-enable global
suggestions with 10.11 release.


Interface consistency
.....................

There are several elements of the UI that are not consistent over the
few web pages for objects of similar nature.  For instance, source
package translations are using a different layout compared to product
releases or distro releases (bug 95744), and there is no page for
release-agnostic source package translations (bug 127884).


Set-up complexity
.................

Setting up projects for translations is an involved and confusing
matter. It is our goal to simplify this, along with making sure that
setting up projects with imported translations is as easy as possible.


Search
------

Launchpad Translations doesn't provide nearly enough search facilities
to help either translators or users.  The only search provided today is
through a single PO file, and that requires knowing exactly what source
package a certain message is in.  Integrated Launchpad Google-search can
be used otherwise, but that doesn't provide nearly as good results as an
optimized search could.

There are several potential paths forward, but they all require
significant investment of time.


POFile search performance
.........................

Search is available as part of POFile:+translate page.  However, with
big PO files (in ddtp-ubuntu templates, exceeding 40k messages), we
commonly see these search queries time out.


Message sharing
---------------

Translations sharing (or, as internally called, "message sharing") is a
feature that brings two benefits: smaller database footprint through
data reuse, and less work for translations with automatic re-use of
existing translations where appropriate.

Migration of data to the message sharing data model is not fully
completed (OpenOffice.org, which is a large part of our translations DB,
has not been migrated fully), though we are already seeing big benefits.

Also, due to data model design, it is naturally attracting data
inconsistency.  That can cause some unexpected behaviour in Launchpad
Translations (i.e. translations can be shared between messages that
don't seem otherwise linked except by the identical English string).
This can only be caused by Ubuntu or Launchpad translation admins, but
is a general problem that also complicates things like translations
removal.

Currently ongoing work will extend message sharing further so
translations are shared between product releases and linked source
packages in Ubuntu.


Translation imports
-------------------

Translation imports happen in an async manner inside two virtual import
queues: Ubuntu and upstream queues.

Import queue itself doesn't provide enough facilities to indicate what
kind of import are we dealing with (soyuz, manual maintainer, bzr,
template builders, manual user uploads), so presents users with very
generic informational notices.

Emails do not contain any X-Launchpad headers, and they commonly confuse
people because of their generic (and sometimes incorrect: "you
uploaded") wording.

Import script also implements its own OOPS handling.  It'd be worth
checking how much of it can be dropped with the newly introduced generic
LaunchpadCronscript OOPS handling.

Translation imports are heavily optimized (they don't reimport
translations if we already have identical message minus some metadata),
but there's still room for improvement.  Especially, template imports
should be optimized in a similar vain.

Translation uploads can be done through two distinctive forms/web pages:
one aimed at project maintainers, and another aimed at translators.
Page aimed at translators is needlessly complex and the currently
ongoing work will simplify it.


Import queue gardener
.....................

Translation import approval is in a very bad state: lots of manual
approval is still required (especially for templates), and that drives
both our users and us crazy.

Imports from code branches use a different code path for template
approvals compared to everything else.  They should be sanitized to use
the same code.

Import queue gardener script can not recognize some of the very common
translation directory layouts (like those used by PHP and Django web
projects), nor can it automatically approve the layout in which
Launchpad exports translations for maintainers.


Translation exports
-------------------

Translation exports are done via a queue as well, even though in most
cases (single file downloads) that should not be necessary
(performance-wise).

On requesting an export, users have no idea how long they need to wait
before they get an email with a download link.

There is no web UI for the export queue (neither for management, nor for
overview).

One can request all translations for a project release, translatable
template, or specific translations inside a single template, or a single
translation.

There is no way to request all translations for a distro/project
release/source package for a single language.


Language packs
--------------

Most Ubuntu translations are taken from Launchpad, usually through
"language packs": exports of all translations for a particular Ubuntu
release.  These are then packaged by Ubuntu packagers.

Launchpad supports two different ways of exporting language packs:
either as a "base" tarball (containing all translations for a certain
release), or as "delta" tarball (only PO files modified since a certain
date, usually a date when last "base" tarball was produced).

At the moment, these processes are long-running (a full export can take
up to 8h, exporting ~100k PO files), but are mostly read-only so don't
put too much stress on our systems.

Lots of post-processing happens inside langpack-o-matic (under Ubuntu
Platform team jurisdiction), but ideally, we'd move that into Launchpad
as well.


Firefox support
---------------

Launchpad Translations supports XPI files (as used in Firefox,
Thunderbird, and similar) partially: namely, it can import them and
offer them for translation.

In order to make use of translations done in Launchpad, specially
formatted XPIPO files can be exported and po2xpi tools (from [1]) can be
used to re-create translated XPI files.  This would ideally be replaced
with full XPI generation inside Launchpad on export.

XPI files containing more than one language translation (perfectly
acceptable by "XPI specification", if there ever is one) are not
supported.

[1] https://code.launchpad.net/~mozillateam/rosetta/po2xpi


Integration with code branches
------------------------------

Import
......

We can import translations from a product release bzr branch in
Launchpad provided they are in a standard gettext layout.

There are requests to extend this to support import from source package
branches as well (bug 407403).

Template approver for imports from bzr branches should be improved to
handle more cases, and to make use of the bzr metadata (like rename
operations).

There's some confusion about the best way to use our available import
options ("templates only" vs "templates and translations", and "one-time
import").


Export
......

Translations can be exported to a branch of one's choice: export happens
daily (or nightly, scheduled to start at 0400UTC).

It is optimized so only the relevant translations are exported, though
that's calculated through a heuristic that needs improving (bug 490668:
it sometimes causes long loops between imports and exports if they are
set up on the same branch).

Branches have to be owned by a person setting translation export up,
though they can later re-assign them to a team or someone else (bug
407260).

It would be nice to offer translations export branches on URLs like
lp:project/series/+translations or similar (bug 392220).


Template generation
...................

Since recently, Launchpad can generate templates automatically for code
branches (provided translation import is set-up) using intltool.
Template generation happens on the Launchpad build farm.

Code to generate templates for different source code branches lives in a
module named "pottery".

At the moment, only intltool-enabled modules are supported (most of
GNOME, and lots of extras).  Pottery needs some cleaning up before it's
generic enough to support other source code tree layouts and translation
formats.

Plans are to extend pottery with ability to generate templates for KDE
modules, "bare" gettext modules (some of them are already supported
through intltool support) and so forth.

Pottery also needs splitting into separate package as it runs on the
build farm slaves.

As of yet, no translation template build histories are shown in the web
UI, though they are being recorded in the database.


Bridging the gap
----------------

"Bridging the gap" is a theme that encompasses several areas of
Launchpad Translations.  The goal is to make use of the lower level
features described above to bridge the gap between Ubuntu and upstream
development/translation communities.

Existing framework allows us to automatically import upstream
intltool-based projects' translations as Launchpad projects.  This
process is not streamlined and UI to set this up will need to be
improved asap.

Ongoing work that will enable translations sharing between products and
Ubuntu will allow us to instantly get upstream translations into Ubuntu
as they are imported into Launchpad.

Since we are importing translations from branches in Launchpad, good
code import facilities are a must.  At the moment, the biggest blocker
is an ability to import branches for git repositories (perhaps even svn
repositories?): bug 380871.

Next step in the "bridging the gap" theme is streamlining the process of
upstream translation submissions.


API
---

Launchpad Translations web service API is seriously lacking: there are
only partial exports of language, import queue and template objects.

At the moment, only import queue API is provided to a reasonable extent.

Languages API is sufficiently complete (though not terribly useful on
its own).

Template objects are exported as well, but they are just the basis for
the "reporting API" as requested by Ubuntu community and OEM teams.  It
was mostly a community effort (read: Adi Roiban :), and some cleaning up
needs to happen before it can be continued.

Tech-debt
---------

As one of the oldest pieces of Launchpad, Rosetta has a lot of very ugly
code.  From the fact that there are many doctests testing what
unit-tests should, or even page tests of similar nature, all the way up
to entangled browser views which make any simple change an exercise in
patience, there is simply a lot of stuff to clean up.

Most-notably, the following:

 * "Translations browsing" interfaces: we did devise a plan during the
Epic for Adi to mostly take on, but he has since lost time to work on
this
 * POFile/TranslationMessage views: these combine a million options into
one view, then combine it through very low level Zope hackery
 * Import approval: two distinct ways of doing the approval should be
abolished and one way to rule it all should be used
 * updateTranslation implementation: this used to be a method that
spanned ten or so of screenfuls, and has since been split up into
smaller methods: yet it's still immensely complex that no living being
can change it without breaking some other obscure edge case.  Work to
fix this is in progress.
 * Clean-up old stuff: through all the refactorings, many tests and
table columns have become useless yet are still sitting around.  The
same is true for some of the code as well.

And I am not even starting on the inconsistent state of OOPS reporting
in translation scripts, or inconsistent usage of DBLoopTuner, usage of
DB triggers (where they cause more trouble then they help) or lack of
any X-Launchpad headers in all the emails that LP Translations sends
out.


Database state summary
----------------------

It may sound weird, but database state is interesting to track when it
comes to Launchpad Translations.

At one time a biggest resource hog in Launchpad, Rosetta has gone to
being just one of the Launchpad apps: our biggest table holds ~68M rows
(with more potential for reduction), operations that used to take a full
week now take 20-35 mins (new distro release opening).

But, it's not perfect yet.

Translation credits cache table (POFileTranslator) is populated via
triggers and they can sometimes put a huge load on the database.

Message sharing migration is not fully complete yet (after months of
running): that means that our data set is larger then it needs to be.

We are hitting PO file statistics update inconsistencies and a full run
of statistics update simply takes too long (~24h).

There're more things we can optimize when it comes to translations
import and translations export (especially regarding language packs,
though they work pretty well already).






Follow ups