← Back to team overview

openteachermaintainers team mailing list archive

Re: Synonyms as answers

 

On Tue, Nov 16, 2010 at 4:51 PM, Marten de Vries <
marten-de-vries@xxxxxxxxxxx> wrote:

> Hi,
> >         (marten-de-vries): I like the idea, but I see some problems:
> >         1) we don't have really reliable web hosting for OpenTeacher.
> >         Yes, we've
> >         got your webhost, mine, and sourceforge.net, but can we trust
> >         them for
> >         years of use? ( Making an OpenTeacher version useless in less
> >         than 5
> >         years doesn't seem right to me. )
> >
> > SourceForge is pretty reliable, but it doesn't support PHP (or any
> > other server-side programming language), so it's not usable. We can,
> > however, use the SourceForge domain and redirect the requests to my
> > webhost. If my webhost ever fails, we adjust the redirect and it would
> > still work. It does not have a 100% uptime, but that's not very
> > important, because the databased is also stored locally.
> Sounds better, but still not perfect. Some kind of P2P network would be
> the nicest solution, but it's not worth the effort I think.
>
> Well, let's keep your solution for the moment.
> >
> >         2) An argument from Cas ( He's an MSN contact of me ): We
> >         can't just
> >         rely on users' clicks on the 'correct anyway' button, because
> >         you're
> >         never sure if they're right. Requiring a synonym to be send
> >         multiple
> >         times before adding it to the list could fix this, but it's
> >         easy for
> >         somebody to mess up the database if he/she wants to.
> >
> > Yes I thought about that too. But it's fine if we just check the
> > submitted synonyms before they get into the database. Also, the
> > maximum amount of submits per day can be limited by IP, to prevent
> > huge attacks.
> I'm not sure how realistic manual checking is, it takes time and you
> need to recognize the language used at least, and we only know a few I
> think.
>

That's true. We can drop the correct anyway function and just get huge
synonym lists for many languages.

Secondly, we don't know how much this button is actually used, and if
> it's used mostly used for synonyms, or more for i.e. typing errors and
> non-obligatory supplements of words.
>
> Finally, I think there are better sources for synonymes, there must be
> web services specialised on this... I'll give it a look (but not now,
> I'm a little busy.)
>

I looked, and didn't find any, and applied the rule: "if it doesn't exist,
make it". But if you can find any that would solve a lot of the problems
indeed.


>
> >         About the offline database, this is exactly where CouchDB is
> >         designed
> >         for, and would be the nicest way of implementing something
> >         like this. It
> >         also solves partly the hosting problem if implemented smart,
> >         because
> >         other CouchDB servers could just host an alternative database
> >         when our
> >         server is going down. It's a huge dependancy however, so maybe
> >         a sqlite
> >         DB would do too... (sqlite is supported via the sqlite3 module
> >         in python
> >         itself.)
> >
> > I don't really see why that is necessary, as you can just save the
> > database to a gzipped JSON-encoded file. This is very small and
> > doesn't require extra software. See the attachment for an example of
> > the file it creates from
> >
> http://www.milanboers.nl/py-synonyms/synonymlist.php?token=163dc7cb5c492d3cc903e724b6594ea52fc1eb08&lang=en&entiredb=yes. It can then be read from the file just like the online database, using
> json decoding.
> It works well for the current size of the database, but keep in mind
> real lists are much larger. I would prefer to for example only download
> the added words instead of the full list, only already for keeping
> network traffic lower.
>

That is why there is a timestamp for every word. That is when the word was
last updated, so you can download only the words that are newer than the
local database. That way you'll probably never have to download more than a
few hundreds of bytes, so this won't take much time.
Also I have 50GB of traffic per month, and currently use about 500MB. If you
divide the other 49,5GB into parts of a few hundreds of bytes, you get
hundreds of millions of requests per month. OpenTeacher has to become very
popular to run out of traffic. So this is server-side not a problem.


> Secondly, decoding large JSON encoded objects takes a lot of memory, and
> most parsers aren't very optimised, I think a real database would
> definitely suit the job better. The data files aren't much larger, and
> harddisk space doesn't matter that much any more nowadays.
>

Is SQLite usable without software dependencies (or loadable as a module so
we can distribute it along) and so also platform-independent (so usable as a
python software library)? If so, then I agree.


>
> Using a database has a second advantage, searches can easily be done on
> the client database itself. This is faster, especially on slow internet
> connections, and reduces the server load. The webserver is only needed
> to spread and update the lists of synonyms then.
>
> And when you compare the above to a description of CouchDB, you'll see a
> match for everything demanded above. JSON, Rest API, replication, etc.
>
> But, I think a sqlite database is more practical in this case for
> portability, so let's drop the CouchDB idea, at least for this moment.
>
> - Marten
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openteachermaintainers<https://launchpad.net/%7Eopenteachermaintainers>
> Post to     : openteachermaintainers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openteachermaintainers<https://launchpad.net/%7Eopenteachermaintainers>
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References