← Back to team overview

dulwich-users team mailing list archive

Re: Tree entries

 

On Thu, 2010-10-07 at 14:59 -0700, David Borowitz wrote:
> I have some benchmark results for namedtuples vs. tuples. First, the
> microbenchmark results:
> $ python -m timeit -s 'from dulwich.objects import TreeEntry; name =
> "foo/bar"; mode = 0100644; sha = "a" * 20' 'x = TreeEntry(name, mode,
> sha)'
> 1000000 loops, best of 3: 0.583 usec per loop
> $ python -m timeit -s 'name = "foo/bar"; mode = 0100644; sha = "a" *
> 20' 'x = (name, mode, sha)'
> 10000000 loops, best of 3: 0.0753 usec per loop

> Obviously the tuple constructor should win over TreeEntry constructor,
> since the latter is a wrapper around the former, and there's
> significant Python function call overhead. But hey, 0.5us is still
> pretty fast.

> Then I ran a much bigger macrobenchmark (attached). Basically, I
> cloned git.git, ran git unpack-object to explode the repo into loose
> files, found all the tree SHAs (without parsing the objects), then
> measured the time to parse all those trees and iterate all their
> entries. In the inner loop I also assigned all of the tuple/namedtuple
> values to locals to check for overhead there.
Thanks very much for doing those benchmarks. With these in mind, I would
be fine with either solution. 

If we would really have to improve on this in the future we could always
look into doing a C version of TreeEntry and have the C implementation
of sorted_tree_items return that directly. That said, it doesn't look
like there's a need for that (and micro-optimization is bad).

Cheers,

Jelmer

Attachment: signature.asc
Description: This is a digitally signed message part


References