u1db-discuss team mailing list archive

Thread
Date

Re: API Design

To: u1db-discuss@xxxxxxxxxxxxxxxxxxx
From: Stuart Langridge <stuart.langridge@xxxxxxxxxxxxx>
Date: Tue, 15 Nov 2011 19:04:35 +0000
In-reply-to: <4EBD8ADB.2080502@canonical.com>
Organization: Canonical

On Fri, 2011-11-11 at 21:51 +0100, Mikkel Kamstrup Erlandsen wrote:
> On 11/11/2011 04:32 PM, Rodney Dawes wrote:
> > On Fri, 2011-11-11 at 08:52 +0000, Stuart Langridge wrote:
> >> I think you might be under a bit of a misapprehension here. The thing
> >> that you pass to the Python functions as a "doc" is a JSON string. It's
> >> not a Python dictionary or some other complex type. Our basic "document"
> >> is a string containing a JSON serialisation of the document; it's not an
> >> object.
> > That is exactly what my complaint is. That 'doc' and every bit of data
> > associated with it (id, revision, etc…) must be maintained and passed
> > around as separate things. I am suggesting we should have a Document
> > class, which contains all of these things in one place. A simple class,
> > with properties for all these bits of data, which can be set/get in
> > accordance to the conventions of the language for each implementation.

> I second Rodney's opinion here. Passing around JSON is *very* 
> inefficient (not to mention inconvenient). And this is something that I 
> am not making up, I've spend lots of time profiling apps and libs with 
> exactly this problem.
> 

I know jam has been profiling this stuff, to actually confirm whether
JSON is slower or not. For now, I don't want to address speed; I just
want to summarise both approaches with pros and cons of each to make
sure we're all talking about the same thing.

The functions in the API that return, or receive, a document can either
return a JSON string or they can return some native type. A "native
type" will be u1db-implementation-language-dependent; for example, in a
Python implementation, create_doc would take either a Document object
(provided by the u1db library) or a Python dictionary. In other
languages either a created object or a native mapping type would be
returned/passed. 

The obvious advantage of a native type is that it's possible to work
with a Document object (or Python dict, etc); you can retrieve
individual field values with documentobject['field1']['subfield1'] or
similar. If your application code has a JSON string then the first thing
you'll do with it is parse it into a native object *anyway*, so why
shouldn't u1db do this for you? This is a very compelling argument.
However, there are a few things which militate against it, which I
present here so that we all understand both sides of the argument.

1. functions returning/taking JSON strings means that the API
documentation is basically identical across platforms. This makes
documentation of u1db easier (the documentation applies whether you're
using a Python port, a Ruby port, a C port, etc)
2. The sync function uses JSON to talk between u1db servers[1]. This
means that a u1db implementation needs to be able to serialise a stored
document to JSON and deserialise it from JSON into storage anyway.
Adding a separate native document type means that there's extra coding
3. The test suite will test compliance of a u1db version by talking to
the sync server. If the syncing uses JSON but the use in-app is using
something else (native objects) then the test suite will not be actually
testing a common code path, which makes it harder to establish whether
an implementation is compliant
4. (a minor objection at best) it's not clear that there is a sensible
"native object" for some implementations (e.g., C). However, a "pure C"
implementation might be useless for this reason, suggesting that there'd
be a "C with glib" implementation and this would have an obvious native
object type.
5. How data is actually stored in a backend is backend-specific. The
get_doc function has to return *something* -- if it's returning a native
object, then the u1db layer still has to read data out of the back end
and turn it into that native object (a back end is unlikely to store an
*actual native object* for retrieval), so there's always some sort of
parsing step involved

Discuss. :-)

sil

[1] don't get hung up on the word "JSON" here. Yes, maybe the sync
function might actually use BSON or something on the wire. The point is
that the wire format that we talk for syncing isn't going to be what you
want in an app anyway; if the get_doc() function returns BSON, you're
still going to have to deserialise that into a native object in your app
anyway, just like you will with JSON. The wire format and the
in-app-object format are not the same, regardless of what each of them
actually are.

Follow ups

Re: API Design
From: Rodney Dawes, 2011-11-15
Re: API Design
From: Mikkel Kamstrup Erlandsen, 2011-11-15

References

API Design
From: Rodney Dawes, 2011-11-11
Re: API Design
From: Stuart Langridge, 2011-11-11
Re: API Design
From: Rodney Dawes, 2011-11-11
Re: API Design
From: Mikkel Kamstrup Erlandsen, 2011-11-11