dulwich-users team mailing list archive

Thread
Date

Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging

To: Jelmer Vernooij <jelmer@xxxxxxxxx>
From: David Borowitz <dborowitz@xxxxxxxxxx>
Date: Fri, 11 Jun 2010 20:55:04 -0400
Cc: dulwich-users@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20100612002857.GA13981@vernstok.nl>

On Fri, Jun 11, 2010 at 20:28, Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:

> On Fri, Jun 11, 2010 at 07:26:31PM -0400, David Borowitz wrote:
> > I'm a bit torn on this issue. On the one hand, I agree that there are
> some
> > things like excludes that we can provide well-defined APIs for rather
> than
> > depending on specific filenames. On the other hand, there are two cases
> that
> > I think merit keeping get_named_file around:
> > 1. Allow users to read and parse files on their own that Dulwich does not
> > (yet) have support for. (Presumably we would encourage such users to
> > contribute their parsing code back to Dulwich and/or transition to the
> > official API once one exists.)
> I don't have any objections against making it possible for people to do
> this but I'd rather keep it optional, not a requirement for the correct
> or full functioning.


What do you mean by the correct or full functioning? I was referring to the
fact that there are a lot of miscellaneous files that can already live under
.git (*_HEAD, RENAMED-REF, branches/*, hooks/*, logs/*). If users of dulwich
want to use these files before we've developed APIs for them, I think we
should let them. Yes, it wouldn't work for non-disk-based Repos, but at
least it would work in the most common case. This is just about the
transitional period, until we provide first-class support for those
features.


> > 2. It's necessary for the implementation of dumb HTTP serving.
> Is there any reason for this to go through the Repo API rather than through
> the standard os module ?
>

Well, the os module wouldn't work in the case when a repo doesn't have a
controldir. I suppose it would be possible to implement all of the services
in web.py:HttpGitApplication using functions that don't rely on
get_named_file. It would be a bit of work, however.


> It doesn't seem right to make up these files on the fly for repository
> implementations that do not use the standard git disk representation.


The problem is that the dumb HTTP protocol itself relies on the standard
disk representation--files named HEAD, info/refs, packed-refs,
objects/pack/pack-*.pack, and so on. So if we get rid of get_named_file in
the manner I described above, you'd in fact end up with *more* on-the-fly
file generation.

Actually, I take that back. Any non-disk-based implementation has to do all
sorts of on-the-fly file generation, to implement dumb HTTP serving, period.
It's just a question of whether it goes in get_named_file or some
higher-level API.

So I guess I'm for making more higher-level functions. I'll try to take a
crack at that in web.py in the next week.

 > I'm still slightly for using '/' to be consistent with refnames and URL
> > representations in the HTTP server. Note that both of these points are
> > orthogonal to the choice of path separator.
> I'm not opposed to making that change but I would prefer to make that
> change
> at the same time as any other changes we make to get_named_file.
>
> Cheers,
>
> Jelmer
>
> >
> > On Wed, Jun 9, 2010 at 17:46, Jelmer Vernooij <jelmer@xxxxxxxxx> wrote:
> >
> > > On Wed, Jun 09, 2010 at 02:48:18PM -0500, Augie Fackler wrote:
> > > > On Mon, Jun 7, 2010 at 5:32 PM, Jelmer Vernooij <jelmer@xxxxxxxxx>
> > > wrote:
> > > > > On Mon, 2010-06-07 at 15:59 -0500, Augie Fackler wrote:
> > > > >> On May 30, 2010, at 9:18 AM, Jelmer Vernooij wrote:
> > > > >> > On Mon, May 24, 2010 at 11:08:56AM -0700, David Borowitz wrote:
> > > > >> > > This is quite the grab-bag of changes :). Available as usual
> at my
> > > > >> > > github
> > > > >> > > fork:
> > > > >> > > http://github.com/dborowitz/dulwich
> > > > >> > >
> > > > >> > > I think Augie has requested that we do code reviews in some
> place
> > > > >> > > public,
> > > > >> > > like this mailing list. If others agree I'm happy to oblige by
> > > > >> > > mailing
> > > > >> > > patches.
> > > > >> > That'd be great - mailed patches are a bit easier to comment on
> for
> > > > >> > me, and this
> > > > >> > mailing list seems like the most appropriate place.
> > > > >> >
> > > > >> > > b9677d0 Add tests for sorted_tree_items and C implementation.
> > > > >> > > 186fb3a Fix memory leak in C implementation of
> sorted_tree_items.
> > > > >> > > 0994d03 Clean up file headers.
> > > > >> > > 4a42aad Fix copyright in dul-web.
> > > > >> > I've merged these, thanks.
> > > > >> >
> > > > >> > > e146b26 Move named file initilization to BaseRepo.
> > > > >> > Thanks, this makes sense.
> > > > >> >
> > > > >> > We should've really started out by naming BaseRepo Repo and
> naming
> > > > >> > Repo
> > > > >> > DiskRepo or something, but it seems hard to change that now
> without
> > > > >> > breaking
> > > > >> > everybody's existing code. :-(
> > > > >> >
> > > > >> > > 9f08973 Use / as separator for named files on all platforms.
> > > > >> > I'm not sure about this change - other than the use of
> info/excludes
> > > > >> > we don't
> > > > >> > actually appear to use / in calls to {get,put}_named_file so it
> > > > >> > seems like a
> > > > >> > silly thing to break the API for.
> > > > >> Is it an API break then? It also means that you can always request
> > > > >> 'info/excludes' without respect to the platform you're on.
> > > > > It changes the behaviour of this particular call on some platforms
> > > > > (specifying info\excludes on Windows will break), so users will
> have to
> > > > > change their callsites.
> > > > >
> > > > >> > Alternatively, perhaps we can make get_named_file
> private/protected?
> > > > >> > I'd be
> > > > >> > less hesitant about making this sort of change if we do that.
> > > > >
> > > > >> -1, it feels like clients of Repo will need to load files within
> .git
> > > > >> and get_named_file feels like the correct call for this.
> > > > > Do they need to do that directly though, and for what sort of
> > > > > functionality? I'd rather see higher-level function calls on Repo,
> in
> > > > > particular because that makes it easier to e.g. implement the Repo
> API
> > > > > on top of other databases with a different on-disk format.
> > > >
> > > > It seems reasonable that we'll need access to (for example)
> > > > info/excludes so we can construct the appropriate matcher for
> > > > Mercurial, Bazaar or whatever else. It seems kind of odd to my mind
> to
> > > > have a function per special file on the repo object.
> > > I think it's reasonable to have a separate function, the fact that the
> > > exclude info can be found in info/excludes is a representation issue.
> > >
> > > I'd rather have a function that returns a list of exclude patterns for
> > > which we can put in a stub that e.g. returns [] or uses some other
> magic.
> > > The alternative is to have a get_named_file() implementation that
> returns
> > > custom file contents (the disk format rather than the content) based on
> the
> > > filename that gets passed in. The implementation for a bzr repo would
> > > look something like this:
> > >
> > > def get_named_file(self, name):
> > >        if name == "info/excludes":
> > >                return StringIO()
> > >        elif name == "description":
> > >                return "A Bazaar repository"
> > >        elif name =="config":
> > >                # Construct a ConfigObj, serialize it and return it as a
> > >                # file-like object
> > >        elif name.startswith("refs/heads/"):
> > >                # Some magic involving a RefsContainer and reserializing
> > >        else:
> > >                ...
> > >
> > > Making the disk format part of the API is something I'd like to avoid
> as
> > > much
> > > as possible as it makes it hard (and slow) to implement the API on top
> of
> > > different disk representations (not just bzr or mercurial but also on
> top
> > > of
> > > say a highly scalable database system). I don't think having a function
> for
> > > getting at the excludes would make the API that much more complicated,
> > > there
> > > aren't much similar files like this.
> > >
> > > Cheers,
> > >
> > > Jelmer
> > >
>
> --
>

Follow ups

Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-06-23

References

Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: David Borowitz, 2010-05-24
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-05-30
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Augie Fackler, 2010-06-07
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-06-07
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Augie Fackler, 2010-06-09
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-06-09
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: David Borowitz, 2010-06-11
Re: Patches: sorted_tree_items, header cleanup, MemoryRepo, logging
From: Jelmer Vernooij, 2010-06-12