← Back to team overview

openerp-doc team mailing list archive

doc.openerp.con & translations

 

Recently I've been looking at a serious historical problem with the
documentation: translations and translations handling.

The current system has many issues:

* Huge overhead (starting a translation means copying everything to a
 new folder, adding 80~120MB to the checkout)
* Inability to use standard translation tool (e.g. Rosetta), and
 impossibility to gather statistics about translation state (e.g.
 completion %)
* Difficulty to keep translations in sync with base doc (the script
 generating and updating translations isn't able to reliably parse
 its output, and commonly generates incorrect/invalid RST)
* Said script also commonly generates incorrect/invalid RST
* Requirements for translators to have knowledge of RST roughly on-par
 with original authors, since the current translation system dumps
 everything into RST directives and all
* Pollution of translated versions with master version content, making
  edition by translators more brittle than in the the master document
  itself.

The result is that there has historically been very little uptake and
ongoing translation maintenance (as of the 7.0 branch, the french,
italian, polish, romanian, vietnamese and "UK" localizations have a
grand total of 0 translated terms, the spanish one has about 10
translated paragraphs/sections left). The russian translation is the
only one where any effort has been made (there are ~800 translated
paragraphs/sections/terms left as of 7.0[1])

In the meantime, Sphinx has grown a gettext-based i18n system[0], which 

* has a much lower overhead (~2.1MB of POT in 7.0)
* uses a standard format
* would allow using Rosetta to share translations across branches
 (through suggestions & the like) and gather statistics about
 translation status
* would make it much harder for translators to break the document

Although it is not without drawbacks[2] I think it remains vastly
superior to the current solution[3] and thus that we should use it not
only for 7.0 but for all previous documentation branches as well. An
internal survey shows people are either indifferent or supportive.

There was also support for dropping outdated/deprecated documentation
cruft from older versions of the documentation: BI, "features" and
"technical_guide" are more or less wasted space and have already been
removed from the 7.0 doc, I aim to remove them from 6.1, 6.0 and 5.0
as well (technical_guide also generates a huge chunk of POT: 3.2MB when
all of the "openerp tutorial" only yields a 1.4MB POT)

Thoughts? Questions? Issues? Things I haven't mentioned/considered?

[0] http://sphinx-doc.org/latest/intl.html
[1] earlier versions have even more data — 5.0 looks nearly complete
[2] * it doesn't allow localizing pictures — this has never been used in
     the current system as far as I've seen, save to fail to update 
     screenshots to more recent ones
   * provides less context for translators
   * doesn't have a "low-pass" filter so 1-character strings will be
     extracted
[3] plus nothing stops us from proposing patches to upstream
[4] technical_guide isn't even on the 5.0 index, it's only in the
   "complete table of contents".