← Back to team overview

u1db-discuss team mailing list archive

Re: API Design

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/15/2011 4:10 PM, John Meinel wrote:
> It would be nice to see real benchmarks before we decide it is 
> necessary. The DB is intended to be size limited (~20MB per db per 
> user). I have a dataset of 50k docs that totals 33MB in JSON, and
> ~60MB in sqlite. I can deserialize all docs in <1s (I'll try to get
> a more accurate timing) in python.
> 
> That doesn't seem particularly slow, and if we are exposing some
> data over HTTP, it seems like a nice format.
> 
> John =:->

I grabbed 'bson' from http://pypi.python.org/pypi/bson. As a very
first thing, it turns out that 'bson.dumps' doesn't support a list as
the top-level object, while that is supported by simplejson. I don't
know if this is specific to BSON the encoding, or bson the python library.

Anyway, it means that while I was able to take a list-of-objects to
loads, I had to switch it to one-by-one for these tests.

Some more exact timings:
 0.865s	simplejson.loads(one_big_string)
 0.874s	simplejson.dumps(one_big_array)

 1.386s simplejson.loads(one_by_one)
 1.445s simplejson.dumps(one_by_one)

  8.480s bson.loads(one_by_one)
 10.068s bson.dumps(one_by_one)


Now I'm guessing 'bson' is a pure python module, while 'simplejson'
has a python-extension to make it faster. I did track down pymongo,
but was a bit confused because the documentation is now wrong.
  http://www.mongodb.org/display/DOCS/BSON#BSON-Python

Specifically, there isn't a pymongo.bson module, instead it has its
own bson module that acts nothing like simplejson's model or what you
get doing 'pip install bson'. You have to use bson.BSON.encode()
(though python-bson has the same name and is used as
bson.loads/dumps). But you still can't do a list, it only supports
object-at-a-time.

The times there are:
 1.444s bson.BSON(one_by_one).decode()
 1.111s bson.BSON.encode(one_by_one)

So that is... slightly slower to go from string => dict [1.386 vs
1.444], and 1.3:1 faster for going from dict => string.

IOW, at the best speed, BSON is 30% faster than JSON in python. Which
doesn't seem to justify switching. If it was 10x faster (like
simplejson is vs the original bson) that would be worth something.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7CiWEACgkQJdeBCYSNAAOYlwCfXB5gQkTdpFnUYVGOLedaePaL
YSsAmwaWMKQci+McBW9V//w1CcWNn6ik
=dpK5
-----END PGP SIGNATURE-----


Follow ups

References