← Back to team overview

fenics team mailing list archive

Re: CMake 2.8.11: ExternalData

 

On 11/04/13 02:53, Florian Rathgeber wrote:
> On 10/04/13 15:54, Anders Logg wrote:
>> On Wed, Apr 10, 2013 at 12:42:13PM +0100, Florian Rathgeber wrote:
>>> On 09/04/13 22:47, Florian Rathgeber wrote:
>>>> On 09/04/13 20:14, Anders Logg wrote:
>>>>> Another option would be git submodules. Florian suggested this to me
>>>>> earlier.
>>>>
>>>> That's what I think would have been a good option for outsourcing the
>>>> references. They are by far the biggest chunk of the FFC repository (in
>>>> size) and only developers care about them, while everyone else has a
>>>> much larger repository to clone which also takes up considerable disk
>>>> space (51M at the moment).
>>>>
>>>> Having the references be a submodule means the
>>>> test/regression/references directory would be a pointer to a particular
>>>> revision (SHA1) of another repository. Each FFC revision would have a
>>>> particular revision of the ffc-references repository associated with it,
>>>> so there is no ambiguity. It would also have the advantage that if we
>>>> would completely redesign the FFC testing infrastructure and wouldn't
>>>> need the references any more we could simply get rid of the submodule
>>>> and wouldn't have to carry around their burden in history forever.
>>>>
>>>> There's a few caveats though:
>>>>
>>>> 1) If we were doing this now we would need to rewrite the history again,
>>>> completely strip the references folder and replace it by the submodule.
>>>>
>>>> 2) Syncing a git repository over to launchpad for automatic package
>>>> building with the bzr builder is not possible if the repository has
>>>> *ever* included a submodule in its history [1], but there are
>>>> workarounds [2] (which can't be run as a BitBucket hook however).
>>>>
>>>> 3) Pull requests would be a bit more tricky since ffc-references and ffc
>>>> would have to be always merged as a pair. For core developers with push
>>>> access to the repositories this could probably be handled with a
>>>> pre-commit hook.
>>>>
>>>> [1]: https://bugs.launchpad.net/bzr-git/+bug/402814
>>>> [2]:
>>>> https://bazaar.launchpad.net/~videolan/vlc/manual-bzr-import/view/head:/manual-bzr-import
>>>
>>> It appears we can't get anyone excited on a discussion of these issues.
>>> Have we scared everyone away?
>>>
>>> What are your thoughts on the submodule for FFC references? If we decide
>>> to rewrite again we should do it asap before people actually start
>>> basing work off the new FFC repo.
>>
>> I think we should rewrite now and do the submodule thing. Then the the
>> references won't clutter the history and we are free to later move
>> them somewhere else (like automatic CMake fetch if we decide to do
>> that).
> 
> I've done some research and there seem to be some options for splicing a
> subdirectory into a submodule while keeping the correct associating
> throughout history i.e. every revision of the main repo points to the
> correct revision of the submodule:
> http://thread.gmane.org/gmane.comp.version-control.git/109805/

Couldn't get this working even after some fiddling.

> http://thread.gmane.org/gmane.comp.version-control.git/164489/
> http://thread.gmane.org/gmane.comp.version-control.git/164463/

The full thread is at
http://thread.gmane.org/gmane.comp.version-control.git/164386/

I could get this to work, and it seems to do pretty much what we want:
splits the subdirectory into a submodule (within the same repository!)
and maintains the correct association by storing the submodule revision
in the parent's index. It does however not create (and update) a
.gitmodules files, so you have to know where the submodule is linked to
the parent and it's slightly awkward put it in place:

$ git clone . test/regression/references
$ rev=`git rev-parse :test/regression/references`
$ ( cd test/regression/references && git reset --hard $rev )

However it should be possible to add a .gitmodules file and then treat
it in the normal way. To be able to push/pull the submodule tree it's
also necessary to create a ref to it e.g.:

$ git update-ref refs/test/references <sha>

Note that this is deliberately *not* a branch ref (which live in
refs/heads/), which means it won't be fetched by default. That means
even though the references tree is in the repository, users don't invest
the bandwidth to fetch it unless they explicitly configure it to (which
developers who want to run regression tests will need to do).

> Regarding the caveats from above: we're willing to accept 1), 2) I think
> is not a big deal (I'm not even sure Johannes is using bzr builder?), so
> the main thing is 3). Given that the history of the references isn't
> really important only the association it's maybe not so scary. It's just
> a bit more work maintaining 2 repositories, though most of it could be
> scripted, at least for the benefit of the core devs.
> 
> I've had another chat with Jed and he suggested using git-fat. He's the
> author and it was specifically written for that use case: keeping a
> unified repository/history but storing large (optional) files outside of
> .git/objects to keep the repository slim. The downside is that you then
> need a separate central location where these files are kept. git-fat
> manages them for you, so running an rsync daemon on the FEniCS web
> server might already do the trick.

After a closer look at git-fat I think it's not perfect for our use
case: the actual files on disk are only stubs (which only contain the 40
byte SHA1) and are replace by the actual big blobs by a smudge/clean
filter, but *only for certain operations*. Unfortunately diff is not one
of them and I think it's the one we care about: being able to view the
diff between the output and the old reference before updating. If we
don't care about the diff we could just as well only store a hash of the
reference.

> We then went on to discuss whether we could in fact leverage git in the
> regression test suite itself: there is no inherent reason why the
> references actually need to exist as files in the work tree. An
> identifiable loose object in the repository would be sufficient. I'll
> forward the log so you can get the idea.

Are there any plans for changing the FFC testing infrastructure?

Florian

> Florian
> 
>> --
>> Anders

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Follow ups

References