openteachermaintainers team mailing list archive
-
openteachermaintainers team
-
Mailing list archive
-
Message #00007
Re: Synonyms as answers
Hi,
> Secondly, we don't know how much this button is actually used,
> and if
> it's used mostly used for synonyms, or more for i.e. typing
> errors and
> non-obligatory supplements of words.
>
> Finally, I think there are better sources for synonymes, there
> must be
> web services specialised on this... I'll give it a look (but
> not now,
> I'm a little busy.)
>
> I looked, and didn't find any, and applied the rule: "if it doesn't
> exist, make it". But if you can find any that would solve a lot of the
> problems indeed.
OK, then we say we build our own service, except we can find a better
one.
> > About the offline database, this is exactly where
> CouchDB is
> > designed
> > for, and would be the nicest way of implementing
> something
> > like this. It
> > also solves partly the hosting problem if
> implemented smart,
> > because
> > other CouchDB servers could just host an alternative
> database
> > when our
> > server is going down. It's a huge dependancy
> however, so maybe
> > a sqlite
> > DB would do too... (sqlite is supported via the
> sqlite3 module
> > in python
> > itself.)
> >
> > I don't really see why that is necessary, as you can just
> save the
> > database to a gzipped JSON-encoded file. This is very small
> and
> > doesn't require extra software. See the attachment for an
> example of
> > the file it creates from
> >
> http://www.milanboers.nl/py-synonyms/synonymlist.php?token=163dc7cb5c492d3cc903e724b6594ea52fc1eb08&lang=en&entiredb=yes . It can then be read from the file just like the online database, using json decoding.
>
> It works well for the current size of the database, but keep
> in mind
> real lists are much larger. I would prefer to for example only
> download
> the added words instead of the full list, only already for
> keeping
> network traffic lower.
>
> That is why there is a timestamp for every word. That is when the word
> was last updated, so you can download only the words that are newer
> than the local database. That way you'll probably never have to
> download more than a few hundreds of bytes, so this won't take much
> time.
> Also I have 50GB of traffic per month, and currently use about 500MB.
> If you divide the other 49,5GB into parts of a few hundreds of bytes,
> you get hundreds of millions of requests per month. OpenTeacher has to
> become very popular to run out of traffic. So this is server-side not
> a problem.
OK, I didn't notice the number in the JSON as a timestamp, I thought it
was an ID or something similar, then I agree with the basic idea.
> Secondly, decoding large JSON encoded objects takes a lot of
> memory, and
> most parsers aren't very optimised, I think a real database
> would
> definitely suit the job better. The data files aren't much
> larger, and
> harddisk space doesn't matter that much any more nowadays.
>
> Is SQLite usable without software dependencies (or loadable as a
> module so we can distribute it along) and so also platform-independent
> (so usable as a python software library)? If so, then I agree.
There's a module for it in every default python installation. Maybe it's
possible to disable it if you compile python manually, but i've never
seen it disabled, so should be fine:
http://docs.python.org/library/sqlite3.html
Unless you (or Cas) doesn't agree on the things above, let's move on:
- What's going to be the source for the synonyms. (Even if we use our
own web service, we'll need the words.)
- What's going to be the interface for our service. (REST I think, but
what pages are going to be available?). Maybe a new project on
launchpad.net is a good idea for the service, so we separate the code,
blueprints and bugs for it. We don't need new 'Maintainer' and 'Driver'
teams, but can reuse the OpenTeacher ones for a start I think. But, we
can wait with this also until we've got some more concrete plans. What
do you think?
I think discussing on how to add this to the OpenTeacher UI isn't very
useful at the moment, because I think we first need to discuss the
OpenTeacher 3 GUI in general before moving to the details.
- Marten de Vries
P.S. I'm thinking about building new binaries this weekend for a second
beta, I think that's useful because there have been some serious
bugs/changes for a beta (just look to the committing history of
lp:openteacher).
Follow ups
References