dulwich-users team mailing list archive

Thread
Date

Re: Tree entries

To: Jelmer Vernooij <jelmer@xxxxxxxxx>
From: David Borowitz <dborowitz@xxxxxxxxxx>
Date: Thu, 7 Oct 2010 08:46:24 -0700
Cc: dulwich-users <dulwich-users@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <1286459835.3117.247.camel@gwenhwyvar>

On Thu, Oct 7, 2010 at 06:57, Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:

> On Wed, 2010-10-06 at 14:19 -0700, David Borowitz wrote:
> > In the stuff I'm doing on rename detection, I've been bitten a few
> > times by the weird ordering of (name, mode, sha) tuples coming out of
> > various tree iteration methods.
>
> > Currently, we have:
> > Tree.entries() yields (mode, name, sha)
> > Tree.iteritems() and BaseObjectStore.iter_tree_contents() yield (name,
> > mode, sha)
> > index.commit_tree() takes (name, sha, mode)
> This is all for hysterical raisins, unfortunately. I'm all for fixing
> it :-)
>
> > We should really standardize this, preferably in one of three ways:
> > 1. Use (name, mode, sha) everywhere.
> > 2. Use a namedtuple of (name, mode, sha).
> > 2. Use a new data object.
> > I would prefer using a namedtuple would make some of the code I'm
> > about to write cleaner. It would also be backwards-compatible with the
> > appropriate type. The downside is that it's another feature that's new
> > in 2.6.
> It doesn't really help that you have two options 2 ! :-)
>
> This is a really performance-sensitive area (at least for me), so if we
> go for anything else than a tuple we should make sure it doesn't have a
> (significant) impact on performance.
>
> I don't think we should drop support for pre-2.6 Python yet; we could of
> course provide a custom implementation for those who run 2.4 or 2.5, but
> that seems like more trouble than (3).
>

Augie has told me from experience with hg that data objects even with slots
have worse performance memory-wise than namedtuples. This is presumably
because there's a per-object overhead for the slots, whereas namedtuples
have properties that live in the class dict. I'm happy to run some
benchmarks

Another thing about the namedtuple function is that you can make it print
out the class definition it creates, which provides an easy way to backport
specific namedtuple types to previous versions.


> 3 seems like the best solution if it doesn't make things too much
> slower. Of course, we could give it semantics similar to what namedtuple
> would give us.


I don't mind much either way. It sounds like benchmark results will be the
deciding factor.


>  > In my ideal world we would get rid of Tree.entries() and change
> > index.commit_tree() to use the standard format, but I don't have a
> > sense of how widely either of those are used. Thoughts on how to
> > proceed?
> Changing commit_tree() is probably possible as it's not very heavily
> used. Removing Tree.entries() is not possible I think, at least not
> without deprecating it first.
>

+1 to officially deprecating it. I thought it was kind of unofficially
deprecated already. (There's two functions that are almost identical, one is
implemented in terms of the other, and it's only for "historical
reasons"--that's pretty much what a deprecated function looks like :)


> Cheers,
>
> Jelmer
>

Follow ups

Re: Tree entries
From: Jelmer Vernooij, 2010-10-16
Re: Tree entries
From: David Borowitz, 2010-10-07

References

Tree entries
From: David Borowitz, 2010-10-06
Re: Tree entries
From: Jelmer Vernooij, 2010-10-07