← Back to team overview

openteachermaintainers team mailing list archive

Re: Synonyms as answers

 

Hi, 
>         Secondly, we don't know how much this button is actually used,
>         and if
>         it's used mostly used for synonyms, or more for i.e. typing
>         errors and
>         non-obligatory supplements of words.
>         
>         Finally, I think there are better sources for synonymes, there
>         must be
>         web services specialised on this... I'll give it a look (but
>         not now,
>         I'm a little busy.) 
> 
> I looked, and didn't find any, and applied the rule: "if it doesn't
> exist, make it". But if you can find any that would solve a lot of the
> problems indeed.

OK, then we say we build our own service, except we can find a better
one.

>         >         About the offline database, this is exactly where
>         CouchDB is
>         >         designed
>         >         for, and would be the nicest way of implementing
>         something
>         >         like this. It
>         >         also solves partly the hosting problem if
>         implemented smart,
>         >         because
>         >         other CouchDB servers could just host an alternative
>         database
>         >         when our
>         >         server is going down. It's a huge dependancy
>         however, so maybe
>         >         a sqlite
>         >         DB would do too... (sqlite is supported via the
>         sqlite3 module
>         >         in python
>         >         itself.)
>         >
>         > I don't really see why that is necessary, as you can just
>         save the
>         > database to a gzipped JSON-encoded file. This is very small
>         and
>         > doesn't require extra software. See the attachment for an
>         example of
>         > the file it creates from
>         >
>         http://www.milanboers.nl/py-synonyms/synonymlist.php?token=163dc7cb5c492d3cc903e724b6594ea52fc1eb08&lang=en&entiredb=yes . It can then be read from the file just like the online database, using json decoding.
>         
>         It works well for the current size of the database, but keep
>         in mind
>         real lists are much larger. I would prefer to for example only
>         download
>         the added words instead of the full list, only already for
>         keeping
>         network traffic lower. 
> 
> That is why there is a timestamp for every word. That is when the word
> was last updated, so you can download only the words that are newer
> than the local database. That way you'll probably never have to
> download more than a few hundreds of bytes, so this won't take much
> time.
> Also I have 50GB of traffic per month, and currently use about 500MB.
> If you divide the other 49,5GB into parts of a few hundreds of bytes,
> you get hundreds of millions of requests per month. OpenTeacher has to
> become very popular to run out of traffic. So this is server-side not
> a problem.
OK, I didn't notice the number in the JSON as a timestamp, I thought it
was an ID or something similar, then I agree with the basic idea.

>         Secondly, decoding large JSON encoded objects takes a lot of
>         memory, and
>         most parsers aren't very optimised, I think a real database
>         would
>         definitely suit the job better. The data files aren't much
>         larger, and
>         harddisk space doesn't matter that much any more nowadays. 
> 
> Is SQLite usable without software dependencies (or loadable as a
> module so we can distribute it along) and so also platform-independent
> (so usable as a python software library)? If so, then I agree.
There's a module for it in every default python installation. Maybe it's
possible to disable it if you compile python manually, but i've never
seen it disabled, so should be fine:

http://docs.python.org/library/sqlite3.html 

Unless you (or Cas) doesn't agree on the things above, let's move on:

- What's going to be the source for the synonyms. (Even if we use our
own web service, we'll need the words.)
- What's going to be the interface for our service. (REST I think, but
what pages are going to be available?). Maybe a new project on
launchpad.net is a good idea for the service, so we separate the code,
blueprints and bugs for it. We don't need new 'Maintainer' and 'Driver'
teams, but can reuse the OpenTeacher ones for a start I think. But, we
can wait with this also until we've got some more concrete plans. What
do you think?

I think discussing on how to add this to the OpenTeacher UI isn't very
useful at the moment, because I think we first need to discuss the
OpenTeacher 3 GUI in general before moving to the details.

- Marten de Vries

P.S. I'm thinking about building new binaries this weekend for a second
beta, I think that's useful because there have been some serious
bugs/changes for a beta (just look to the committing history of
lp:openteacher).




Follow ups

References