← Back to team overview

dulwich-users team mailing list archive

Re: Tree entries

 

On Sat, Oct 16, 2010 at 14:58, Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:

> Hi Dave,
>
> On Thu, 2010-10-07 at 08:46 -0700, David Borowitz wrote:
> > On Thu, Oct 7, 2010 at 06:57, Jelmer Vernooij <jelmer@xxxxxxxxx>
> > wrote:
> >         On Wed, 2010-10-06 at 14:19 -0700, David Borowitz wrote:
> >         3 seems like the best solution if it doesn't make things too
> >         much
> >         slower. Of course, we could give it semantics similar to what
> >         namedtuple
> >         would give us.
>
> > I don't mind much either way. It sounds like benchmark results will be
> > the deciding factor.
> Yeah. I also think it's important whatever we do will keep working with
> older versions of Python. I'm quite surprised the namedtuple approach is
> faster than slotted objects.


namedtuples are indeed slower than slotted objects for general attribute
lookups. On the other hand, it is faster to unpack a namedtuple into N
variables than to do N assignment statements.

I haven't looked at the actual C implementations, but here's how I
understand it:
-Objects store references to the attribute values in their attribute slots.
This means once you have the slot, you have the attribute value, but comes
at the cost of per-instance memory overhead for storing the slots list.
-namedtuples store itemgetter callables in class attributes. This results in
indirection on attribute lookups, but saves memory since the attributes are
in the class dict rather than the object.

The reasons to go with namedtuples:
-Memory usage, as mentioned above.
-Source compatibility with code that expects Tree.iteritems() to return
tuples. (We could fudge this by implementing  __iter__ and __getitem__ on
the slotted objects, but that would be slow and probably ugly.)


> >         > In my ideal world we would get rid of Tree.entries() and
> >         change
> >         > index.commit_tree() to use the standard format, but I don't
> >         have a
> >         > sense of how widely either of those are used. Thoughts on
> >         how to
> >         > proceed?
> >
> >         Changing commit_tree() is probably possible as it's not very
> >         heavily
> >         used. Removing Tree.entries() is not possible I think, at
> >         least not
> >         without deprecating it first.
>
> > +1 to officially deprecating it. I thought it was kind of unofficially
> > deprecated already. (There's two functions that are almost identical,
> > one is implemented in terms of the other, and it's only for
> > "historical reasons"--that's pretty much what a deprecated function
> > looks like :)
> It looks like we didn't actually have a "items" method though. I'm
> adding one now.
>
> Cheers,
>
> Jelmer
>
>

References