u1db-discuss team mailing list archive
-
u1db-discuss team
-
Mailing list archive
-
Message #00012
Re: API Design
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/15/2011 4:10 PM, John Meinel wrote:
> It would be nice to see real benchmarks before we decide it is
> necessary. The DB is intended to be size limited (~20MB per db per
> user). I have a dataset of 50k docs that totals 33MB in JSON, and
> ~60MB in sqlite. I can deserialize all docs in <1s (I'll try to get
> a more accurate timing) in python.
>
> That doesn't seem particularly slow, and if we are exposing some
> data over HTTP, it seems like a nice format.
>
> John =:->
I grabbed 'bson' from http://pypi.python.org/pypi/bson. As a very
first thing, it turns out that 'bson.dumps' doesn't support a list as
the top-level object, while that is supported by simplejson. I don't
know if this is specific to BSON the encoding, or bson the python library.
Anyway, it means that while I was able to take a list-of-objects to
loads, I had to switch it to one-by-one for these tests.
Some more exact timings:
0.865s simplejson.loads(one_big_string)
0.874s simplejson.dumps(one_big_array)
1.386s simplejson.loads(one_by_one)
1.445s simplejson.dumps(one_by_one)
8.480s bson.loads(one_by_one)
10.068s bson.dumps(one_by_one)
Now I'm guessing 'bson' is a pure python module, while 'simplejson'
has a python-extension to make it faster. I did track down pymongo,
but was a bit confused because the documentation is now wrong.
http://www.mongodb.org/display/DOCS/BSON#BSON-Python
Specifically, there isn't a pymongo.bson module, instead it has its
own bson module that acts nothing like simplejson's model or what you
get doing 'pip install bson'. You have to use bson.BSON.encode()
(though python-bson has the same name and is used as
bson.loads/dumps). But you still can't do a list, it only supports
object-at-a-time.
The times there are:
1.444s bson.BSON(one_by_one).decode()
1.111s bson.BSON.encode(one_by_one)
So that is... slightly slower to go from string => dict [1.386 vs
1.444], and 1.3:1 faster for going from dict => string.
IOW, at the best speed, BSON is 30% faster than JSON in python. Which
doesn't seem to justify switching. If it was 10x faster (like
simplejson is vs the original bson) that would be worth something.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7CiWEACgkQJdeBCYSNAAOYlwCfXB5gQkTdpFnUYVGOLedaePaL
YSsAmwaWMKQci+McBW9V//w1CcWNn6ik
=dpK5
-----END PGP SIGNATURE-----
Follow ups
References