← Back to team overview

u1db-discuss team mailing list archive

Re: API Design

 

On 11/15/2011 08:04 PM, Stuart Langridge wrote:
On Fri, 2011-11-11 at 21:51 +0100, Mikkel Kamstrup Erlandsen wrote:
On 11/11/2011 04:32 PM, Rodney Dawes wrote:
On Fri, 2011-11-11 at 08:52 +0000, Stuart Langridge wrote:
I think you might be under a bit of a misapprehension here. The thing
that you pass to the Python functions as a "doc" is a JSON string. It's
not a Python dictionary or some other complex type. Our basic "document"
is a string containing a JSON serialisation of the document; it's not an
object.
That is exactly what my complaint is. That 'doc' and every bit of data
associated with it (id, revision, etc…) must be maintained and passed
around as separate things. I am suggesting we should have a Document
class, which contains all of these things in one place. A simple class,
with properties for all these bits of data, which can be set/get in
accordance to the conventions of the language for each implementation.
I second Rodney's opinion here. Passing around JSON is *very*
inefficient (not to mention inconvenient). And this is something that I
am not making up, I've spend lots of time profiling apps and libs with
exactly this problem.

I know jam has been profiling this stuff, to actually confirm whether
JSON is slower or not. For now, I don't want to address speed; I just
want to summarise both approaches with pros and cons of each to make
sure we're all talking about the same thing.

The functions in the API that return, or receive, a document can either
return a JSON string or they can return some native type. A "native
type" will be u1db-implementation-language-dependent; for example, in a
Python implementation, create_doc would take either a Document object
(provided by the u1db library) or a Python dictionary. In other
languages either a created object or a native mapping type would be
returned/passed.

The obvious advantage of a native type is that it's possible to work
with a Document object (or Python dict, etc); you can retrieve
individual field values with documentobject['field1']['subfield1'] or
similar. If your application code has a JSON string then the first thing
you'll do with it is parse it into a native object *anyway*, so why
shouldn't u1db do this for you? This is a very compelling argument.
However, there are a few things which militate against it, which I
present here so that we all understand both sides of the argument.

1. functions returning/taking JSON strings means that the API
documentation is basically identical across platforms. This makes
documentation of u1db easier (the documentation applies whether you're
using a Python port, a Ruby port, a C port, etc)
I am not sure I agree. If there is a Document class the APIs could be largely equivalent on different runtimes:

  my_int = doc.get_int32("bar")  # Python // or JS
  var my_int = doc.get_int32 ("bar") // Vala
  int32_t my_int = u1db_document_get_int32 (doc, "bar"); // C
  int32_t my_int = doc->get_int32 ("bar"); // C++

(and yes, Python, JS, and C++ may use map comprehensions and the likes, but something like doc.get_int32("bar") is still useful for clarity and type safety)

2. The sync function uses JSON to talk between u1db servers[1]. This
means that a u1db implementation needs to be able to serialise a stored
document to JSON and deserialise it from JSON into storage anyway.
Adding a separate native document type means that there's extra coding
Extra coding for u1db yes, but more coding for everyone else.

3. The test suite will test compliance of a u1db version by talking to
the sync server. If the syncing uses JSON but the use in-app is using
something else (native objects) then the test suite will not be actually
testing a common code path, which makes it harder to establish whether
an implementation is compliant
I am not sure I understand what the problem is here. If there is some generic compliance suite then by very design it's impossible to test internal behaviour of individual libs.

4. (a minor objection at best) it's not clear that there is a sensible
"native object" for some implementations (e.g., C). However, a "pure C"
implementation might be useless for this reason, suggesting that there'd
be a "C with glib" implementation and this would have an obvious native
object type.
I actually did a small pure C stub impl of a u1db API back when u1db was first mentioned; to try out if gobject-introspection could work without glib. By working with gobject stylemethod naming and opaque pointers to instance structs I'd say is very clean. And it worked pretty well with GI, as long as one sticks to the glib coding style and uses gtk-doc annotations for the functions and types.

(although coding C without glib sucks... we can maybe still use glib, but not gobject...)

5. How data is actually stored in a backend is backend-specific. The
get_doc function has to return *something* -- if it's returning a native
object, then the u1db layer still has to read data out of the back end
and turn it into that native object (a back end is unlikely to store an
*actual native object* for retrieval), so there's always some sort of
parsing step involved

All the more reason to hide the storage format from the API as well I'd say.

The only place where I'd say one could need access to the raw storage format would be to implement some optimized parsing routines that does not require allocating full objects all over the place - ala SAX parsing fx. This could be added at a later point to the API though by using some visitor pattern or other though, so not really an argument.


Follow ups

References