← Back to team overview

u1db-discuss team mailing list archive

Re: API Design

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/16/2011 10:42 AM, John Rowland Lenton wrote:
> On Tue, 15 Nov 2011 16:46:41 +0100, John Arbash Meinel
> <john@xxxxxxxxxxxxxxxxx> wrote:
>> 
>> I grabbed 'bson' from http://pypi.python.org/pypi/bson. As a
>> very first thing, it turns out that 'bson.dumps' doesn't support
>> a list as the top-level object, while that is supported by
>> simplejson. I don't know if this is specific to BSON the
>> encoding, or bson the python library.
> 
> for it to be a realistic production case comparison, could you use 
> 'cjson' instead of (or as well as) simplejson?

cjson appears to be less maintained, and simplejson has had a C
extension for a while. However, for completeness, I'll include it.

> 
> Thanks.

Summary table:

	encode_all	encode_by_1	decode_all	decode_by_1
simple	0.766		1.130		0.511		0.795
cjson	1.000		0.965		0.511		0.526
bson			0.873				0.515


So encoding/decoding them all in batch is about ~1.5:1 faster for
simplejson. cjson's numbers are surprisingly close in the batch vs
one-by-one mode.

bson is 1.25-1.5:1 faster than simplejson in 1-by-1 mode, but
simplejson is faster still in batch mode.

IMO, the choice of encoding should be determined by something other
than encode/decode speed, as these numbers are too close to really be
considered meaningful.

Since you can't make the top-level object a list, I'm not 100% sure
how you encode a list of objects to a file in bson. For expediency I did:

>>> with open('music_metadata.bson', 'wb') as f: 
>>> f.writelines(b+'\n' for b in as_bson)

Afterwards I have:
$ \ls -s music_metadata.*
33744 music_metadata.bson  33464 music_metadata.json

So not only is json about as fast to parse, it is actually smaller on
disk as well. (I'm guessing type-length-prefixed consumes more than
two bytes vs json's "".)

John
=:->



Explicit steps for the record:

$ cat load_script.py
import bson
import cjson
import simplejson

data = open('music_metadata.json', 'r').read()
data = '[' + data + ']'
objs = simplejson.loads(data)
as_bson = [bson.BSON.encode(o) for o in objs]
as_json = [simplejson.dumps(o) for o in objs]

$ alias TIMEIT="python -m timeit -s 'from load_script import bson,
cjson, simplejson, data, objs, as_bson, as_json'"

$ TIMEIT "simplejson.loads(data)"
10 loops, best of 3: 511 msec per loop

$ TIMEIT "simplejson.dumps(objs)"
10 loops, best of 3: 766 msec per loop

$ TIMEIT "cjson.decode(data)"
10 loops, best of 3: 511 msec per loop

$ TIMEIT "cjson.encode(objs)"
10 loops, best of 3: 1 sec per loop

$ TIMEIT "[simplejson.loads(d) for d in as_json]"
10 loops, best of 3: 795 msec per loop

$ TIMEIT "[simplejson.dumps(o) for o in objs]"
10 loops, best of 3: 1.13 sec per loop

$ TIMEIT "[cjson.decode(d) for d in as_json]"
10 loops, best of 3: 526 msec per loop

$ TIMEIT "[cjson.encode(o) for o in objs]"
10 loops, best of 3: 965 msec per loop

$ TIMEIT "[bson.BSON.decode(d) for d in as_bson]"
10 loops, best of 3: 515 msec per loop

$ TIMEIT "[bson.BSON.encode(o) for o in objs]"
10 loops, best of 3: 873 msec per loop

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7DllcACgkQJdeBCYSNAAPSagCgxgzD/E6feG/tnLQ7Zxw7kCCU
NmYAnidNrNW2moZojqwAmkR144f6RjyA
=Cuid
-----END PGP SIGNATURE-----


References