← Back to team overview

openteachermaintainers team mailing list archive

Automatic translations when inserting words

 

Imagine: you want to learn some words of the language of the country
where you're going to on holiday. But how to start? Just enter some
words you want to know ('hello!', 'excuse me', 'how are you?' etc.), and
click the 'translate!'-button. And start learning...

Whiteboard:
This will require a dictionary, and much code.

At the moment I´m planning to implement it this way:
- Create a 'Interface' (like in java), with functions like lookUp(word,
inLanguage, outLanguage), getCloseMatches(word, language),
synonyms(word, language)
- 'Implement' the interface with various backends: dictd, google
translate, etc. (So we're not having direct problems if for example
google shuts their API down.)
- Add the function to the program (mostly GUI-work by then.) Get in the
preference window the dictionary source you want to use.

-- Milan:
I made a class called python-googletranslate to be easily used to
translate a string from a certain language to another language using
Google Translate (and it can also detect a language, which might be
helpful). I agree that multiple sources like Google Translate can be
used so we're not dependent on Google, although I don't expect them to
shut down their API (even the older version of the API is still usable).
A dictionary which can return multiple words would be better than Google
Translate because a word can mean multiple things. Does anyone know an
online dictionary with an API?

-- Marten:
Give 'libtranslate' (http://www.nongnu.org/libtranslate/) a look. There
aren't python bindings available as far as I know, but it should be
helpful when implementing other dictionaries too. Also it proves it is
possible to write one api, and feed it with input from an xml file.

By the way, google does support multiple translations, however it may
require another API:
http://translate.google.com/translate_t?hl=&ie=UTF-8&text=story&sl=en&tl=nl#en|nl|teach

I've given the source of python-googletranslate a look (found it via
your website), it's a good place to start. One thing however, in ubuntu,
the 'simplejson' instead of 'json' module is included. Their syntax is
the same, but it's the reason why I mostly do:

try:
    import simplejson as json
except ImportError:
    import json

Using the json module means dropping python 2.5 support, but at the
moment we'll implement this, we don't need that support anymore anyway.

--Milan:
Unfortunately the multiple translation function is not in the Google API
developer's guide:
http://code.google.com/apis/language/translate/v2/using_rest.html
It is possible to extract it from the HTML source, but Google tends to
change their website every now and then making it obsolete and
non-functional.
libtranslate does basically the same as python-googletranslate, except
that it's not in python and it has more websites to look at. These
websites listed (Babelfish and SYSTRAN) don't have API's though. So I'm
going to have a look at how they did that.

--Marten:
It's a pity Google doesn't have multiple translations in their api,
they're the most reliable service I think.

libtranslate just uses extracting data from the html user pages, but a
change doesn't break their program, because the services are defined in
an xml file. ( See: http://linux.die.net/man/5/libtranslate for an
explanation of their file format. )

If we had reliable hosting ( sure that it stays online for several
years ), we could host such an xml file online. Problem is that, when we
can't find anybody to maintain it, the whole feature in OpenTeacher
becomes useless.

Oh, did you know I've already had a small discussion about this at the
ubuntu-nl forum? : http://forum.ubuntu-nl.org/programmeren/%
28openteacher%
29-advies-nodig-over-spellingscontrole-automatische-vertaling/ (dutch)

The info isn't completely up-to-date anymore, but it's worth a look I
think.

At last, it's maybe nice to know how Teach2000 implements this. They're
using a .dll-bestand ( shared library ), in which a couple of
dictionaries are compiled. The files is about 7 MB. ( I know this
because i've looked at it once through a hex editor, which made readable
words visible. ) I think we can't use that file, but maybe generating
one is possible. ( The FreeDict dictionaries? )