← Back to team overview

u1db-discuss team mailing list archive

Re: API Design

 

It would be nice to see real benchmarks before we decide it is necessary.
The DB is intended to be size limited (~20MB per db per user). I have a
dataset of 50k docs that totals 33MB in JSON, and ~60MB in sqlite. I can
deserialize all docs in <1s (I'll try to get a more accurate timing) in
python.

That doesn't seem particularly slow, and if we are exposing some data over
HTTP, it seems like a nice format.

John
=:->
On Nov 11, 2011 9:51 PM, "Mikkel Kamstrup Erlandsen" <
mikkel.kamstrup@xxxxxxxxxxxxx> wrote:

> On 11/11/2011 04:32 PM, Rodney Dawes wrote:
>
>> On Fri, 2011-11-11 at 08:52 +0000, Stuart Langridge wrote:
>>
>>> I think you might be under a bit of a misapprehension here. The thing
>>> that you pass to the Python functions as a "doc" is a JSON string. It's
>>> not a Python dictionary or some other complex type. Our basic "document"
>>> is a string containing a JSON serialisation of the document; it's not an
>>> object.
>>>
>> That is exactly what my complaint is. That 'doc' and every bit of data
>> associated with it (id, revision, etc…) must be maintained and passed
>> around as separate things. I am suggesting we should have a Document
>> class, which contains all of these things in one place. A simple class,
>> with properties for all these bits of data, which can be set/get in
>> accordance to the conventions of the language for each implementation.
>>
>>  (sorry if I lack context here, I just joined the list and this thread is
> not in the archives on https://lists.launchpad.net/**u1db-discuss/<https://lists.launchpad.net/u1db-discuss/>for some reason)
>
> I second Rodney's opinion here. Passing around JSON is *very* inefficient
> (not to mention inconvenient). And this is something that I am not making
> up, I've spend lots of time profiling apps and libs with exactly this
> problem.
>
> I haven't actually looked at the Python code yet, but I saw the same
> behaviour in the C version.
>
> Having a Document class also separates the wire format from the
> programmatic representation which I think is a big plus as well,
> architecture wise.
>
> And while I am ranting - any chance we can use a binary format instead of
> JSON, maybe BSON or GVariant, fx? Parsing JSON is super slow compared to
> these[1]. Also in in C. Again from bitter experience :-)
>
> Other than that, let me just use my first mail here to make clear that I
> am super hyped about the idea of u1db! Let's make this rock! :-D
>
> Cheers,
> Mikkel
>
> [1] (ok, you got me, I haven't profiled BSON vs JSON, but I have done it
> for GVariant)
>
> --
> Mailing list: https://launchpad.net/~u1db-**discuss<https://launchpad.net/~u1db-discuss>
> Post to     : u1db-discuss@lists.launchpad.**net<u1db-discuss@xxxxxxxxxxxxxxxxxxx>
> Unsubscribe : https://launchpad.net/~u1db-**discuss<https://launchpad.net/~u1db-discuss>
> More help   : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
>

Follow ups

References