← Back to team overview

openteachermaintainers team mailing list archive

Re: Synonyms as answers

 

Hello, 
> I've created a new project called OpenSynonyms (lol) for just the
> synonyms-part of OpenTeacher.
> http://www.launchpad.net/opensynonyms
> I've made the OpenTeacher Maintainers group the maintainer, and
> OpenTeacher Drivers the driver. This way you should all have the same
> rights as always, except that this is now a (very) small side-project
> of OpenTeacher. I've added the Python class I've been writing today to
> the bazaar branch. This is already very functional, and using SQLite,
> although it still needs a little work, especially on error handling.
> Try it!
> http://bazaar.launchpad.net/~openteachermaintainers/opensynonyms/trunk/files
I've given the source a look, and will test it more extensively later.
(Probably this weekend.) Nice start!

> The structure of the SQLite database is as follows: there are tables
> which have the names of language codes (en,nl,..), and these have the
> fields id, words and lastupdate. id is an integer key used for
> updating, to make sure that if a word has to be updated because the
> lastupdate time has changed, the right word is being changed; words
> contains a string of words which are synonyms of eachother
> comma-seperated; and lastupdate is the unix timestamp at which the
> word was last updated on the server.
> On the server the structure is basically the same, but that's
> irrelevant.
Maybe such a structure would also work:

languages:
id
name
last_update
words:
id
language_id
word
last_update
synonyms: /* this is what in Dutch is called a 'koppeltabel', I don't
directly know a good translation. It links one table to another. */
first_word_id
second_word_id
last_update

This way, searches are completely ID (integer) based. (=fast.) Also,
there isn't any replicate data except for integers. (normalisation.)

For synchronisation the last_update timestamps are used, also efficient
enough.

At last, it's easy to add a 'translations'-table, making this kind of
databases a full-blown dictionary.

I did made this table model just now from scratch, so comments are
welcome :).

> Also this way, others who might ever look for something like this will
> be able to use this without having to find out the whole OpenTeacher
> source, which fits in very well with the free software spirit. And if
> there's ever anyone who would contribute to this side-project but not
> to OpenTeacher that can be arranged too.
Maybe we should also for example separate the WRTS API from OpenTeacher
in the future for this reason. A disadvantage is that it depends on the
Word and WordList classes of OpenTeacher.

>         - What's going to be the source for the synonyms. (Even if we
>         use our
>         own web service, we'll need the words.) 
> 
> For now I used
> http://www.englisch-hilfen.de/en/words_list/synonyms.htm this little
> list for English synonyms. I made a little script to extract them and
> add them to the database. It's no problem to do this for more
> websites, but it is a problem to find them.
And to make it harder, we need to find dictionaries with the right
licenses... Didn't OpenOffice.org/LibreOffice have a synonym list? Maybe
we can reuse that one then... - Looked it up:
http://wiki.services.openoffice.org/wiki/Documentation/OOoAuthors_User_Manual/Writer_Guide/Using_the_thesaurus
They've got this service for certain languages, worth further
investigation I think. 
- Marten de Vries




References