← Back to team overview

dulwich-users team mailing list archive

Re: RFC: Closing resources and controlling cache life times

 

On Tue, Apr 28, 2015 at 02:58:18PM +0200, Gary van der Merwe wrote:
>  Jelmer commented on 4 Jun 2014
> <https://github.com/jelmer/dulwich/pull/186#issuecomment-45040116>
> 
> > Can you split this up?I agree that this the correct thing to do for proto.
> >
> > For repositories I am less certain. We should definitely do *something*,
> > but another approach would be to use a "read locks"-style approach like in
> > bzrlib. That would have the added benefit of not having to stat so much
> > when looking for objects, to see if loose object files are present or if
> > the pack file directory has changed.
> >
> 
> This was in response to a patch that, amongst other things, made
> ObjectStore and Repo objects into context managers, where the __exit__
> would close any open resources (aka pack files). I understood what you were
> saying to be: these context managers should also be used to allow the user
> of the library to control when we check if it is necessary to update our
> caches.

Actually, I mostly meant to say that treating them as context managers
is one approach, but there are other options. This is the kind of
thing we can't easily change once it's been introduced as it it's a
significant chunk of the public API, so we should be careful.

What kind of operations are we trying to optimize exactly? Which
Dulwich users (with what access patterns?) are we trying to help?

I'm not convinced statting for existing (but missing) loose objects is
actually a significant problem at the moment. In what cases would you
stat for the same object SHA multiple times *and* want avoid a stat?

Not reading/statting the packlist multiple times seems to me like it
would definitely be a win. Do we need API changes for that though, or
could we e.g. just use inotify to watch it?

> 
> I have had a go at implementing this, and have come to the conclusion that
> having these 2 aspects controlled by a single context manager is not a good
> idea.
> 
> Once you call close() on a Repo/ObjectStore/Pack, that object becomes
> unusable when mmap is being used for packs. Hence having a single context
> manager not work one wanted to keep a Repo object open for a long time,
> while still picking up writes happening in other processes. e.g. if we
> writing a web server that used the same Repo object for multiple requests.
> 
> So I would like to propose the following:
> 
> * We add a Repo.close method, as well as making Repo and ObjectStore
> objects into context managers, where the __exit__ would call .close()
+1

(And possibly also from __del__?)

> * We add a new context manager Repo.cache_lock() that updates caches on
> enter, but then prevents further cache updates for it's lifetime. If you
> don't call this, the code will work as before, i.e. the pack cache will
> update every time ObjectStore.packs is accessed.
I'm not sure about this. Caching adds complications; let's start a
separate thread about what our goals are with this, and the pros/cons
of the various options.

Cheers,

Jelmer

-- 
Jelmer Vernooij <jelmer@xxxxxxxxx> - https://jelmer.uk/

Attachment: signature.asc
Description: Digital signature


Follow ups

References