← Back to team overview

dulwich-users team mailing list archive

RFC: Closing resources and controlling cache life times

 

 Jelmer commented on 4 Jun 2014
<https://github.com/jelmer/dulwich/pull/186#issuecomment-45040116>

> Can you split this up?I agree that this the correct thing to do for proto.
>
> For repositories I am less certain. We should definitely do *something*,
> but another approach would be to use a "read locks"-style approach like in
> bzrlib. That would have the added benefit of not having to stat so much
> when looking for objects, to see if loose object files are present or if
> the pack file directory has changed.
>

This was in response to a patch that, amongst other things, made
ObjectStore and Repo objects into context managers, where the __exit__
would close any open resources (aka pack files). I understood what you were
saying to be: these context managers should also be used to allow the user
of the library to control when we check if it is necessary to update our
caches.

I have had a go at implementing this, and have come to the conclusion that
having these 2 aspects controlled by a single context manager is not a good
idea.

Once you call close() on a Repo/ObjectStore/Pack, that object becomes
unusable when mmap is being used for packs. Hence having a single context
manager not work one wanted to keep a Repo object open for a long time,
while still picking up writes happening in other processes. e.g. if we
writing a web server that used the same Repo object for multiple requests.

So I would like to propose the following:

* We add a Repo.close method, as well as making Repo and ObjectStore
objects into context managers, where the __exit__ would call .close()
* We add a new context manager Repo.cache_lock() that updates caches on
enter, but then prevents further cache updates for it's lifetime. If you
don't call this, the code will work as before, i.e. the pack cache will
update every time ObjectStore.packs is accessed.
* We make other information that is loaded into memory (e.g. alternatives,
packed refs) update in a similar fashion to packs.

This would mean that for a library user who wants to keeps a Repo object
around for a long time, their code might look like:

    repo = Repo(path)

    with repo.cache_lock():
        # access repo

    # wait some time, with possible outside writes

    with repo.cache_lock():
        # access repo

    repo.close()

For something that only wants to use a repo for a short time, and then
close it, their code might look like:

    with Repo(path) as repo:
        with repo.cache_lock():
            # access repo

It is unfortunate that you need 2 with statements. Of course if you do
this, it will still work, but you might have unnecessary stat calls:

    with Repo(path) as repo:
        # access repo



How do you feel about this?


Regards,

Gary

Follow ups