← Back to team overview

u1db-discuss team mailing list archive

Re: API Design

 

On 11/15/2011 04:10 PM, John Meinel wrote:

It would be nice to see real benchmarks before we decide it is necessary. The DB is intended to be size limited (~20MB per db per user). I have a dataset of 50k docs that totals 33MB in JSON, and ~60MB in sqlite. I can deserialize all docs in <1s (I'll try to get a more accurate timing) in python.


I'll whip up a tool to compare plain column-field mapping, json, and gvariant extracted from an sqlite db, with datasets of different sizes. Column-field mapping probably being the benchmark for everything else.

Where I am coming from 1s is an eternity. I am interested in how much data you can pull out or insert without skipping too many frames (also on embedded devices). I realize that one probably wouldn't ever try to deserialize 50k docs on an embedded device, so I'll make sure we can test with different sized corpora.

That doesn't seem particularly slow, and if we are exposing some data over HTTP, it seems like a nice format.

That's a quantum statement - is it still a nice format if no one is around to use it? ;-)

Cheers,
Mikkel

On Nov 11, 2011 9:51 PM, "Mikkel Kamstrup Erlandsen" <mikkel.kamstrup@xxxxxxxxxxxxx <mailto:mikkel.kamstrup@xxxxxxxxxxxxx>> wrote:

    On 11/11/2011 04:32 PM, Rodney Dawes wrote:

        On Fri, 2011-11-11 at 08:52 +0000, Stuart Langridge wrote:

            I think you might be under a bit of a misapprehension
            here. The thing
            that you pass to the Python functions as a "doc" is a JSON
            string. It's
            not a Python dictionary or some other complex type. Our
            basic "document"
            is a string containing a JSON serialisation of the
            document; it's not an
            object.

        That is exactly what my complaint is. That 'doc' and every bit
        of data
        associated with it (id, revision, etc…) must be maintained and
        passed
        around as separate things. I am suggesting we should have a
        Document
        class, which contains all of these things in one place. A
        simple class,
        with properties for all these bits of data, which can be
        set/get in
        accordance to the conventions of the language for each
        implementation.

    (sorry if I lack context here, I just joined the list and this
    thread is not in the archives on
    https://lists.launchpad.net/u1db-discuss/ for some reason)

    I second Rodney's opinion here. Passing around JSON is *very*
    inefficient (not to mention inconvenient). And this is something
    that I am not making up, I've spend lots of time profiling apps
    and libs with exactly this problem.

    I haven't actually looked at the Python code yet, but I saw the
    same behaviour in the C version.

    Having a Document class also separates the wire format from the
    programmatic representation which I think is a big plus as well,
    architecture wise.

    And while I am ranting - any chance we can use a binary format
    instead of JSON, maybe BSON or GVariant, fx? Parsing JSON is super
    slow compared to these[1]. Also in in C. Again from bitter
    experience :-)

    Other than that, let me just use my first mail here to make clear
    that I am super hyped about the idea of u1db! Let's make this
    rock! :-D

    Cheers,
    Mikkel

    [1] (ok, you got me, I haven't profiled BSON vs JSON, but I have
    done it for GVariant)

-- Mailing list: https://launchpad.net/~u1db-discuss
    <https://launchpad.net/%7Eu1db-discuss>
    Post to     : u1db-discuss@xxxxxxxxxxxxxxxxxxx
    <mailto:u1db-discuss@xxxxxxxxxxxxxxxxxxx>
    Unsubscribe : https://launchpad.net/~u1db-discuss
    <https://launchpad.net/%7Eu1db-discuss>
    More help   : https://help.launchpad.net/ListHelp



Follow ups

References