dulwich-users team mailing list archive

Thread
Date

Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging

To: David Borowitz <dborowitz@xxxxxxxxxx>
From: Jelmer Vernooij <jelmer@xxxxxxxxx>
Date: Sat, 12 Jun 2010 02:28:58 +0200
Cc: dulwich-users@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTinujX3if8imXrtcqroIERsEOCC5sqbiiCWJdnMY@mail.gmail.com>
User-agent: Mutt/1.5.20 (2009-06-14)

On Fri, Jun 11, 2010 at 07:26:31PM -0400, David Borowitz wrote:
> I'm a bit torn on this issue. On the one hand, I agree that there are some
> things like excludes that we can provide well-defined APIs for rather than
> depending on specific filenames. On the other hand, there are two cases that
> I think merit keeping get_named_file around:
> 1. Allow users to read and parse files on their own that Dulwich does not
> (yet) have support for. (Presumably we would encourage such users to
> contribute their parsing code back to Dulwich and/or transition to the
> official API once one exists.)
I don't have any objections against making it possible for people to do 
this but I'd rather keep it optional, not a requirement for the correct 
or full functioning.

> 2. It's necessary for the implementation of dumb HTTP serving.
Is there any reason for this to go through the Repo API rather than through
the standard os module ? 

It doesn't seem right to make up these files on the fly for repository 
implementations that do not use the standard git disk representation.

> I'm still slightly for using '/' to be consistent with refnames and URL
> representations in the HTTP server. Note that both of these points are
> orthogonal to the choice of path separator.
I'm not opposed to making that change but I would prefer to make that change 
at the same time as any other changes we make to get_named_file.

Cheers,

Jelmer

> 
> On Wed, Jun 9, 2010 at 17:46, Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> 
> > On Wed, Jun 09, 2010 at 02:48:18PM -0500, Augie Fackler wrote:
> > > On Mon, Jun 7, 2010 at 5:32 PM, Jelmer Vernooij <jelmer@xxxxxxxxx>
> > wrote:
> > > > On Mon, 2010-06-07 at 15:59 -0500, Augie Fackler wrote:
> > > >> On May 30, 2010, at 9:18 AM, Jelmer Vernooij wrote:
> > > >> > On Mon, May 24, 2010 at 11:08:56AM -0700, David Borowitz wrote:
> > > >> > > This is quite the grab-bag of changes :). Available as usual at my
> > > >> > > github
> > > >> > > fork:
> > > >> > > http://github.com/dborowitz/dulwich
> > > >> > >
> > > >> > > I think Augie has requested that we do code reviews in some place
> > > >> > > public,
> > > >> > > like this mailing list. If others agree I'm happy to oblige by
> > > >> > > mailing
> > > >> > > patches.
> > > >> > That'd be great - mailed patches are a bit easier to comment on for
> > > >> > me, and this
> > > >> > mailing list seems like the most appropriate place.
> > > >> >
> > > >> > > b9677d0 Add tests for sorted_tree_items and C implementation.
> > > >> > > 186fb3a Fix memory leak in C implementation of sorted_tree_items.
> > > >> > > 0994d03 Clean up file headers.
> > > >> > > 4a42aad Fix copyright in dul-web.
> > > >> > I've merged these, thanks.
> > > >> >
> > > >> > > e146b26 Move named file initilization to BaseRepo.
> > > >> > Thanks, this makes sense.
> > > >> >
> > > >> > We should've really started out by naming BaseRepo Repo and naming
> > > >> > Repo
> > > >> > DiskRepo or something, but it seems hard to change that now without
> > > >> > breaking
> > > >> > everybody's existing code. :-(
> > > >> >
> > > >> > > 9f08973 Use / as separator for named files on all platforms.
> > > >> > I'm not sure about this change - other than the use of info/excludes
> > > >> > we don't
> > > >> > actually appear to use / in calls to {get,put}_named_file so it
> > > >> > seems like a
> > > >> > silly thing to break the API for.
> > > >> Is it an API break then? It also means that you can always request
> > > >> 'info/excludes' without respect to the platform you're on.
> > > > It changes the behaviour of this particular call on some platforms
> > > > (specifying info\excludes on Windows will break), so users will have to
> > > > change their callsites.
> > > >
> > > >> > Alternatively, perhaps we can make get_named_file private/protected?
> > > >> > I'd be
> > > >> > less hesitant about making this sort of change if we do that.
> > > >
> > > >> -1, it feels like clients of Repo will need to load files within .git
> > > >> and get_named_file feels like the correct call for this.
> > > > Do they need to do that directly though, and for what sort of
> > > > functionality? I'd rather see higher-level function calls on Repo, in
> > > > particular because that makes it easier to e.g. implement the Repo API
> > > > on top of other databases with a different on-disk format.
> > >
> > > It seems reasonable that we'll need access to (for example)
> > > info/excludes so we can construct the appropriate matcher for
> > > Mercurial, Bazaar or whatever else. It seems kind of odd to my mind to
> > > have a function per special file on the repo object.
> > I think it's reasonable to have a separate function, the fact that the
> > exclude info can be found in info/excludes is a representation issue.
> >
> > I'd rather have a function that returns a list of exclude patterns for
> > which we can put in a stub that e.g. returns [] or uses some other magic.
> > The alternative is to have a get_named_file() implementation that returns
> > custom file contents (the disk format rather than the content) based on the
> > filename that gets passed in. The implementation for a bzr repo would
> > look something like this:
> >
> > def get_named_file(self, name):
> >        if name == "info/excludes":
> >                return StringIO()
> >        elif name == "description":
> >                return "A Bazaar repository"
> >        elif name =="config":
> >                # Construct a ConfigObj, serialize it and return it as a
> >                # file-like object
> >        elif name.startswith("refs/heads/"):
> >                # Some magic involving a RefsContainer and reserializing
> >        else:
> >                ...
> >
> > Making the disk format part of the API is something I'd like to avoid as
> > much
> > as possible as it makes it hard (and slow) to implement the API on top of
> > different disk representations (not just bzr or mercurial but also on top
> > of
> > say a highly scalable database system). I don't think having a function for
> > getting at the excludes would make the API that much more complicated,
> > there
> > aren't much similar files like this.
> >
> > Cheers,
> >
> > Jelmer
> >

--

Follow ups

Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: David Borowitz, 2010-06-12

References

Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: David Borowitz, 2010-05-24
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-05-30
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Augie Fackler, 2010-06-07
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-06-07
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Augie Fackler, 2010-06-09
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-06-09
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: David Borowitz, 2010-06-11