← Back to team overview

openstack team mailing list archive

Proposal for manuals translation process

 

Hi, all

During the "I18N in OpenStack" discussion in design summit, it is mentioned
that documents need to I18N. I also noticed some requests for a Chinese
version manuals from China users. But unlike Gettext strings in the codes,
there is no process for DocBook translation yet. Translators, who want to
help translation, have to take a DocBook into a tool and perform a
translation on a copy which will be saved as a new file. This traditional
translation model is not good for collaboration. Usually, the open source
translation depends on volunteers. It's better to use the crowd translation
model, which enables a mass of translators to work on the same job, just
like the Launchpad Web UI for Gettext strings translation, any people can
jump in at any time and contribute to any part of the translatable
contents.

In order to facilitate the manuals translation, I investigated several
translation websites and several open source projects. I composed this
proposal. Now it's open for suggestions and comments.

Goal
------------
A process for manuals translation

Background
--------------
OpenStack Manuals are in DocBook format. The source is on GitHub:
http://github.com/openstack/openstack-manuals
Launchpad and Transifex are free web based tools used for crowd
translation. Both of them provide a simple web interface in which
non-technical people can help translation. They don't support DocBook
format, but support the popular GNU Gettext file formats (PO Template or
PO).

Translation Process
-------------------
In order to translate OpenStack Manuals to multiple languages, which are in
DocBook format, we can slice the documents into short statements, then use
a web based translation management tool to manage the translation process,
and finally converge the translated content into a new copy of DocBook.

Here are the five steps of the translation process:
Step #1 Slicing - extract translatable content from DocBooks and generate
Gettext compatible POT files (PO Template or PO);
Step #2 Uploading - upload the POT (or PO) files to a web based translation
management tool;
Step #3 Downloading - download PO (or MO) files from the web tool after
translation and review;
Step #4 Converging - converge the translated contents into new copies of
DocBook, create DocBooks in multiple languages
Step #5 Generating - generate HTML/PDF in multiple languages from DocBooks
in multiple languages

The picture in the attachment describes these steps.
(See attached file: DocBook translation process.png)

Compare of Launchpad and Transifex
-------------------
Launchpad (https://launchpad.net/) and Transifex (
https://www.transifex.net/) are similar web based tools used for crowd
translation. The goal of the compare is to find the most appropriate tool
for this scenario. The compare are made between Launchpad and Transifex
free version for open sources. (Refer to https://www.transifex.net/plans/
to get details of “Transifex free version for open sources”)

After considering the requirements for manuals translation,  below
perspectives are taking into consideration:
*Supported format
*DocBook slicing support
*Converging support
*Source uploading method
*Output downloading method
*Translation Memory support
*Translation history support
*Change management
*Translation Dictionary
Refer to Table 1 for detail information of the compare.

Another important measurement to compare is the workload. Having the five
steps in the process execute automatically as much as possible will
decrease the workload of translation coordinators.
Refer to Table 2 for the detail of workload compare when using Launchpad or
Transifex for DocBook translation.

Here are the conclusions after the compare,
(1) the workload using Transifex is similar with using Launchpad.
(2) The advantages of Launchpad are:
* Leverage the same user id and user group of developers, users,
translators of Gettext strings.
* Leverage the same contribution calculating method "Karma", with fixing
bugs, answering questions and Gettext strings translation.
(3) The advantage of Transifex is better translation memory support.
The disadvantage of Transifex is having different user registration and
user interfaces. Both the translators and the coordinators need to register
in a new website and get familiar with a new user interfaces before
translation.

Based on these analysis, I think, using Launchpad to do the manuals
translation is a good choice.

Other considerations
-------------------
*Translation Dictionary
Translation Dictionary here means terminology translation. It is very
helpful to ensure the translation quality. Unfortunately, both Launchpad
and Transifex don't support Translation Dictionary. I suggest to use wiki
pages to document the terminology translation for translators reference.
Here is a sample wiki page for Eclipse globalization:
http://wiki.eclipse.org/French_Glossary.

*Change Management
Launchpad and Transifex support the synchronize of old PO files and new PO
files in their own ways. They will compare the new po and the existing po
and handle the changes automatically. But new PO files won't be generated
automatically after DocBooks are changed. Translation coordinators need to
generate new PO files by running a Python program manually.
I will suggest to develop a program in future, to monitor the update of
manuals GitHub repository. When a DocBook is updated, a new PO file will be
generated and synchronized with the old one in the Launchpad server.

*Machine translation
Is it necessary to include machine translation?  Machine translation can be
executed before human beings review. Then translators won't need to
translate from scratch. Translators can review the result of machine
translation and correct them.
But after investigation, I found the quality of free machine translations,
which have API exported, are not so good. I doubt whether a poor quality
machine translation is helpful.
Anyway, if most of the community members want to include machine
translation, it is possible to improve the slicing program, to generate a
PO file with the results of machine translation.

Reference
-------------------
Table 1 - Compare of Launchpad and Transifex
                                                                                    
                    |     Launchpad      |                Transifex                 
 -------------------+--------------------+----------------------------------------- 
 Supported format   |pot file (.pot),    |android string resources (.xml),          
                    |po file (.po)       |po file (.po),                            
                    |                    |html (.html),                             
                    |                    |WIKI file (.wiki), etc.                   
                    |                    |(Note, DocBook is not a supported file    
                    |                    |format; OpenStack Wiki format is not a    
                    |                    |supported wiki format.)                   
 -------------------+--------------------+----------------------------------------- 
 DocBook Slicing    |No                  |No                                        
 support            |                    |                                          
 -------------------+--------------------+----------------------------------------- 
 Converging support |No                  |No                                        
 -------------------+--------------------+----------------------------------------- 
 Source uploading   |Two methods:        |Two methods:                              
 method             |a> Automatic        |a> Use a command tool “Transifex Client”  
                    |template imports    |to synchronize the server with local      
                    |from Bazaar branch  |repository (local folder) by typing       
                    |b> Manually upload  |several commands.                         
                    |template (or an     |b> Manually upload a source translation   
                    |archive) through    |file from web interface;                  
                    |Launchpad's web     |                                          
                    |interface.          |                                          
 -------------------+--------------------+----------------------------------------- 
 Output downloading |Two methods:        |Two methods:                              
 method             |a> Automatic save   |a> Use “Transifex Client” to download the 
                    |output files to     |latest translations from the server by    
                    |Bazaar branch;      |typing one command.                       
                    |b> Manually download|b> Manually download through web          
                    |output files through|interface.                                
                    |web interface.      |                                          
 -------------------+--------------------+----------------------------------------- 
 Translation Memory |The exact same      |The similar translation items will be     
 support            |translation items in|listed as a reference. Translation memory 
                    |other projects can  |can be shared within two and more         
                    |be listed as a      |projects.                                 
                    |reference.          |                                          
 -------------------+--------------------+----------------------------------------- 
 Translation history|Yes                 |Yes                                       
 support            |                    |                                          
 -------------------+--------------------+----------------------------------------- 
 Change management  |Launchpad will      |When you push some local updates to       
                    |automatically update|server, Transifex will overwrite the      
                    |its data every time |existing source strings and translations  
                    |you push a new      |with the updated version.                 
                    |revision to the     |(Note: This may lead to loss of           
                    |Bazaar branch.      |translations. So users need to make sure  
                    |                    |the local repository contains the latest  
                    |                    |translation results in the server.)       
 -------------------+--------------------+----------------------------------------- 
 Translation        |No                  |No                                        
 Dictionary         |                    |                                          
                                                                                    



Table 2 - Workload compare when using Launchpad or Transifex for DocBook
translation
                                                                                    
                           |Using Launchpad            |Using Transifex             
 --------------------------+---------------------------+--------------------------- 
 Step 1: Slicing           |Python program [1] can be  |Same with Launchpad         
                           |used to slice all the      |                            
                           |DocBook together in one    |                            
                           |command                    |                            
 --------------------------+---------------------------+--------------------------- 
 Step 2: Uploading         |If the source code is      |Use “Transifex Client” to   
                           |synchronized with Bazaar,  |upload resources to         
                           |the uploading can be       |Transifex server from local 
                           |automatically handled by   |repository (local folder)   
                           |Launchpad.                 |by typing several commands. 
 --------------------------+---------------------------+--------------------------- 
 Step 3: Downloading       |Launchpad can commit daily |Use “Transifex Client” to   
                           |snapshots of the           |download the latest         
                           |translations to a Bazaar   |translations from the       
                           |branch in a specific       |server by typing one        
                           |folder.                    |command.                    
 --------------------------+---------------------------+--------------------------- 
 Step 4: Converging        |Python program [2] can be  |Same with Launchpad         
                           |used to coverge all the po |                            
                           |files back to DocBooks     |                            
 --------------------------+---------------------------+--------------------------- 
 Step 5: Generating        |Maven command can be used  |Same with Launchpad         
                           |to generate HTML/PDF from  |                            
                           |DocBooks                   |                            
                                                                                    


[1] The Python program can be written based on “xml2po” to slice all
DocBooks of the manuals project to translatable strings in batch. “xml2po”
is an existing Python program in GNOME gnome-doc-utils package which can
extracts translatable content from free-form XML documents and outputs
gettext compatible POT files.
[2] The Python program can be written based on “xml2po”, to converge the
translated strings back to copies of DocBooks in batch.


Regards
Daisy Guo

GIF image

Attachment: DocBook translation process.png
Description: PNG image


Follow ups