← Back to team overview

openteachermaintainers team mailing list archive

Re: Synonyms as answers

 

Hello Maintainers,

I've created a new project called OpenSynonyms (lol) for just the
synonyms-part of OpenTeacher.
http://www.launchpad.net/opensynonyms
I've made the OpenTeacher Maintainers group the maintainer, and OpenTeacher
Drivers the driver. This way you should all have the same rights as always,
except that this is now a (very) small side-project of OpenTeacher. I've
added the Python class I've been writing today to the bazaar branch. This is
already very functional, and using SQLite, although it still needs a little
work, especially on error handling. Try it!
http://bazaar.launchpad.net/~openteachermaintainers/opensynonyms/trunk/files

The structure of the SQLite database is as follows: there are tables which
have the names of language codes (en,nl,..), and these have the fields id,
words and lastupdate. id is an integer key used for updating, to make sure
that if a word has to be updated because the lastupdate time has changed,
the right word is being changed; words contains a string of words which are
synonyms of eachother comma-seperated; and lastupdate is the unix timestamp
at which the word was last updated on the server.
On the server the structure is basically the same, but that's irrelevant.

Also this way, others who might ever look for something like this will be
able to use this without having to find out the whole OpenTeacher source,
which fits in very well with the free software spirit. And if there's ever
anyone who would contribute to this side-project but not to OpenTeacher that
can be arranged too.

Read along for a little more discussion!

On Tue, Nov 16, 2010 at 8:15 PM, Marten de Vries <
marten-de-vries@xxxxxxxxxxx> wrote:

> Hi,
> >         Secondly, we don't know how much this button is actually used,
> >         and if
> >         it's used mostly used for synonyms, or more for i.e. typing
> >         errors and
> >         non-obligatory supplements of words.
> >
> >         Finally, I think there are better sources for synonymes, there
> >         must be
> >         web services specialised on this... I'll give it a look (but
> >         not now,
> >         I'm a little busy.)
> >
> > I looked, and didn't find any, and applied the rule: "if it doesn't
> > exist, make it". But if you can find any that would solve a lot of the
> > problems indeed.
>
> OK, then we say we build our own service, except we can find a better
> one.
>
> >         >         About the offline database, this is exactly where
> >         CouchDB is
> >         >         designed
> >         >         for, and would be the nicest way of implementing
> >         something
> >         >         like this. It
> >         >         also solves partly the hosting problem if
> >         implemented smart,
> >         >         because
> >         >         other CouchDB servers could just host an alternative
> >         database
> >         >         when our
> >         >         server is going down. It's a huge dependancy
> >         however, so maybe
> >         >         a sqlite
> >         >         DB would do too... (sqlite is supported via the
> >         sqlite3 module
> >         >         in python
> >         >         itself.)
> >         >
> >         > I don't really see why that is necessary, as you can just
> >         save the
> >         > database to a gzipped JSON-encoded file. This is very small
> >         and
> >         > doesn't require extra software. See the attachment for an
> >         example of
> >         > the file it creates from
> >         >
> >
> http://www.milanboers.nl/py-synonyms/synonymlist.php?token=163dc7cb5c492d3cc903e724b6594ea52fc1eb08&lang=en&entiredb=yes. It can then be read from the file just like the online database, using
> json decoding.
> >
> >         It works well for the current size of the database, but keep
> >         in mind
> >         real lists are much larger. I would prefer to for example only
> >         download
> >         the added words instead of the full list, only already for
> >         keeping
> >         network traffic lower.
> >
> > That is why there is a timestamp for every word. That is when the word
> > was last updated, so you can download only the words that are newer
> > than the local database. That way you'll probably never have to
> > download more than a few hundreds of bytes, so this won't take much
> > time.
> > Also I have 50GB of traffic per month, and currently use about 500MB.
> > If you divide the other 49,5GB into parts of a few hundreds of bytes,
> > you get hundreds of millions of requests per month. OpenTeacher has to
> > become very popular to run out of traffic. So this is server-side not
> > a problem.
> OK, I didn't notice the number in the JSON as a timestamp, I thought it
> was an ID or something similar, then I agree with the basic idea.
>
> >         Secondly, decoding large JSON encoded objects takes a lot of
> >         memory, and
> >         most parsers aren't very optimised, I think a real database
> >         would
> >         definitely suit the job better. The data files aren't much
> >         larger, and
> >         harddisk space doesn't matter that much any more nowadays.
> >
> > Is SQLite usable without software dependencies (or loadable as a
> > module so we can distribute it along) and so also platform-independent
> > (so usable as a python software library)? If so, then I agree.
> There's a module for it in every default python installation. Maybe it's
> possible to disable it if you compile python manually, but i've never
> seen it disabled, so should be fine:
>
> http://docs.python.org/library/sqlite3.html
>
> Unless you (or Cas) doesn't agree on the things above, let's move on:
>
> - What's going to be the source for the synonyms. (Even if we use our
> own web service, we'll need the words.)
>

For now I used http://www.englisch-hilfen.de/en/words_list/synonyms.htm this
little list for English synonyms. I made a little script to extract them and
add them to the database. It's no problem to do this for more websites, but
it is a problem to find them.


> - What's going to be the interface for our service. (REST I think, but
> what pages are going to be available?). Maybe a new project on
> launchpad.net is a good idea for the service, so we separate the code,
> blueprints and bugs for it. We don't need new 'Maintainer' and 'Driver'
> teams, but can reuse the OpenTeacher ones for a start I think. But, we
> can wait with this also until we've got some more concrete plans. What
> do you think?
>
> I think discussing on how to add this to the OpenTeacher UI isn't very
> useful at the moment, because I think we first need to discuss the
> OpenTeacher 3 GUI in general before moving to the details.
>
> - Marten de Vries
>
> P.S. I'm thinking about building new binaries this weekend for a second
> beta, I think that's useful because there have been some serious
> bugs/changes for a beta (just look to the committing history of
> lp:openteacher).
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openteachermaintainers<https://launchpad.net/%7Eopenteachermaintainers>
> Post to     : openteachermaintainers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openteachermaintainers<https://launchpad.net/%7Eopenteachermaintainers>
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References