u1db-discuss team mailing list archive
-
u1db-discuss team
-
Mailing list archive
-
Message #00025
Re: API Design
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11/16/2011 10:42 AM, John Rowland Lenton wrote:
> On Tue, 15 Nov 2011 16:46:41 +0100, John Arbash Meinel
> <john@xxxxxxxxxxxxxxxxx> wrote:
>>
>> I grabbed 'bson' from http://pypi.python.org/pypi/bson. As a
>> very first thing, it turns out that 'bson.dumps' doesn't support
>> a list as the top-level object, while that is supported by
>> simplejson. I don't know if this is specific to BSON the
>> encoding, or bson the python library.
>
> for it to be a realistic production case comparison, could you use
> 'cjson' instead of (or as well as) simplejson?
cjson appears to be less maintained, and simplejson has had a C
extension for a while. However, for completeness, I'll include it.
>
> Thanks.
Summary table:
encode_all encode_by_1 decode_all decode_by_1
simple 0.766 1.130 0.511 0.795
cjson 1.000 0.965 0.511 0.526
bson 0.873 0.515
So encoding/decoding them all in batch is about ~1.5:1 faster for
simplejson. cjson's numbers are surprisingly close in the batch vs
one-by-one mode.
bson is 1.25-1.5:1 faster than simplejson in 1-by-1 mode, but
simplejson is faster still in batch mode.
IMO, the choice of encoding should be determined by something other
than encode/decode speed, as these numbers are too close to really be
considered meaningful.
Since you can't make the top-level object a list, I'm not 100% sure
how you encode a list of objects to a file in bson. For expediency I did:
>>> with open('music_metadata.bson', 'wb') as f:
>>> f.writelines(b+'\n' for b in as_bson)
Afterwards I have:
$ \ls -s music_metadata.*
33744 music_metadata.bson 33464 music_metadata.json
So not only is json about as fast to parse, it is actually smaller on
disk as well. (I'm guessing type-length-prefixed consumes more than
two bytes vs json's "".)
John
=:->
Explicit steps for the record:
$ cat load_script.py
import bson
import cjson
import simplejson
data = open('music_metadata.json', 'r').read()
data = '[' + data + ']'
objs = simplejson.loads(data)
as_bson = [bson.BSON.encode(o) for o in objs]
as_json = [simplejson.dumps(o) for o in objs]
$ alias TIMEIT="python -m timeit -s 'from load_script import bson,
cjson, simplejson, data, objs, as_bson, as_json'"
$ TIMEIT "simplejson.loads(data)"
10 loops, best of 3: 511 msec per loop
$ TIMEIT "simplejson.dumps(objs)"
10 loops, best of 3: 766 msec per loop
$ TIMEIT "cjson.decode(data)"
10 loops, best of 3: 511 msec per loop
$ TIMEIT "cjson.encode(objs)"
10 loops, best of 3: 1 sec per loop
$ TIMEIT "[simplejson.loads(d) for d in as_json]"
10 loops, best of 3: 795 msec per loop
$ TIMEIT "[simplejson.dumps(o) for o in objs]"
10 loops, best of 3: 1.13 sec per loop
$ TIMEIT "[cjson.decode(d) for d in as_json]"
10 loops, best of 3: 526 msec per loop
$ TIMEIT "[cjson.encode(o) for o in objs]"
10 loops, best of 3: 965 msec per loop
$ TIMEIT "[bson.BSON.decode(d) for d in as_bson]"
10 loops, best of 3: 515 msec per loop
$ TIMEIT "[bson.BSON.encode(o) for o in objs]"
10 loops, best of 3: 873 msec per loop
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk7DllcACgkQJdeBCYSNAAPSagCgxgzD/E6feG/tnLQ7Zxw7kCCU
NmYAnidNrNW2moZojqwAmkR144f6RjyA
=Cuid
-----END PGP SIGNATURE-----
References