← Back to team overview

openteachermaintainers team mailing list archive

Re: Synonyms as answers

 

Hi, 
>         (marten-de-vries): I like the idea, but I see some problems:
>         1) we don't have really reliable web hosting for OpenTeacher.
>         Yes, we've
>         got your webhost, mine, and sourceforge.net, but can we trust
>         them for
>         years of use? ( Making an OpenTeacher version useless in less
>         than 5
>         years doesn't seem right to me. ) 
> 
> SourceForge is pretty reliable, but it doesn't support PHP (or any
> other server-side programming language), so it's not usable. We can,
> however, use the SourceForge domain and redirect the requests to my
> webhost. If my webhost ever fails, we adjust the redirect and it would
> still work. It does not have a 100% uptime, but that's not very
> important, because the databased is also stored locally.
Sounds better, but still not perfect. Some kind of P2P network would be
the nicest solution, but it's not worth the effort I think.

Well, let's keep your solution for the moment.
>  
>         2) An argument from Cas ( He's an MSN contact of me ): We
>         can't just
>         rely on users' clicks on the 'correct anyway' button, because
>         you're
>         never sure if they're right. Requiring a synonym to be send
>         multiple
>         times before adding it to the list could fix this, but it's
>         easy for
>         somebody to mess up the database if he/she wants to. 
> 
> Yes I thought about that too. But it's fine if we just check the
> submitted synonyms before they get into the database. Also, the
> maximum amount of submits per day can be limited by IP, to prevent
> huge attacks.
I'm not sure how realistic manual checking is, it takes time and you
need to recognize the language used at least, and we only know a few I
think.

Secondly, we don't know how much this button is actually used, and if
it's used mostly used for synonyms, or more for i.e. typing errors and
non-obligatory supplements of words.

Finally, I think there are better sources for synonymes, there must be
web services specialised on this... I'll give it a look (but not now,
I'm a little busy.)

>         About the offline database, this is exactly where CouchDB is
>         designed
>         for, and would be the nicest way of implementing something
>         like this. It
>         also solves partly the hosting problem if implemented smart,
>         because
>         other CouchDB servers could just host an alternative database
>         when our
>         server is going down. It's a huge dependancy however, so maybe
>         a sqlite
>         DB would do too... (sqlite is supported via the sqlite3 module
>         in python
>         itself.) 
> 
> I don't really see why that is necessary, as you can just save the
> database to a gzipped JSON-encoded file. This is very small and
> doesn't require extra software. See the attachment for an example of
> the file it creates from
> http://www.milanboers.nl/py-synonyms/synonymlist.php?token=163dc7cb5c492d3cc903e724b6594ea52fc1eb08&lang=en&entiredb=yes . It can then be read from the file just like the online database, using json decoding.
It works well for the current size of the database, but keep in mind
real lists are much larger. I would prefer to for example only download
the added words instead of the full list, only already for keeping
network traffic lower. 
Secondly, decoding large JSON encoded objects takes a lot of memory, and
most parsers aren't very optimised, I think a real database would
definitely suit the job better. The data files aren't much larger, and
harddisk space doesn't matter that much any more nowadays.

Using a database has a second advantage, searches can easily be done on
the client database itself. This is faster, especially on slow internet
connections, and reduces the server load. The webserver is only needed
to spread and update the lists of synonyms then.

And when you compare the above to a description of CouchDB, you'll see a
match for everything demanded above. JSON, Rest API, replication, etc. 

But, I think a sqlite database is more practical in this case for
portability, so let's drop the CouchDB idea, at least for this moment.

- Marten




Follow ups

References