← Back to team overview

fenics team mailing list archive

Re: Cleanup of repositories

 

On 25 March 2013 08:31, Florian Rathgeber <florian.rathgeber@xxxxxxxxx> wrote:
> On 22/03/13 09:59, Johan Hake wrote:
>> On 03/22/2013 10:57 AM, Anders Logg wrote:
>>> On Fri, Mar 22, 2013 at 10:52:25AM +0100, Johan Hake wrote:
>>>> On 03/22/2013 10:36 AM, Anders Logg wrote:
>>>>> On Fri, Mar 22, 2013 at 10:32:50AM +0100, Johan Hake wrote:
>>>>>>>
>>>>>>>
>>>>>>> Not exactly:
>>>>>>>
>>>>>>> - Meshes in demos --> remove (already done)
>>>>>> I suggest we keep these. There aren't any big files
>>>>>> anyhow, are there?
>>>>>
>>>>> They have already been removed and there's a good system in
>>>>> place for handling them. Keeping the meshes elsewhere will
>>>>> encourage use of the mesh gallery and keeping better track
>>>>> of which meshes to use. There were lots of meshes named
>>>>> 'mesh.xml' or 'mesh2d.xml' which were really copies of other
>>>>> meshes used in other demos, some of them were gzipped, some
>>>>> not etc. That's all very clean now. Take a look at how it's
>>>>> done in trunk. I think it looks quite nice.
>>>>
>>>> Nice and clean, but it really is just 30 meshes. Duplications
>>>> are mostly related to dolfin_fine.xml.gz, which there are 7
>>>> copies of, and that file is 86K.
>
> If they're bit-by-bit identical git will only store a single copy in
> the repository anyway, regardless of how many copies you happen to
> have in the working tree.
>

Clever.

> On the note of storing gzipped meshes: Do they change frequently?

No.

> Why
> are they stored gzipped?

Habit. It's not good for version control.

> Compressed files have a few issues:
> 1) they're treated as binary i.e. any change requires a new copy of
> the entire file to be stored
> 2) they can't be diffed
> 3) git compresses its packfiles anyway, so there is little (if any)
> space gain through compression
>
>>>>> Most of the example meshes are not that big, but multiply
>>>>> that by 30 and then some when meshes are moved around or
>>>>> renamed.
>>>>
>>>> I just question if it is worth it. Seems convenient to just
>>>> have the meshes there.
>>>
>>> Keeping the meshes there will put a limit on which demos we can
>>> add. I think it would be good to allow for more complex demos
>>> requiring bigger meshes (not necessarily run on the buildbot
>>> every day).
>>
>> Ok.
>>
>>>> If we keep them out of the repo I think we should include some
>>>> automagic downloading when building the demos.
>>>
>>> Yes, or at least a message stating: "You have not downloaded demo
>>> data. Please run the script foo."
>>>
>>>> Also should we rename the script to download-demo-meshes, or
>>>> something more descriptive, as this is what that script now
>>>> basically does?
>>>
>>> It is not only meshes, but also markers and velocity fields.
>>> Perhaps it can be renamed download-demo-data?
>>
>> Sounds good.
>>
>> Johan
>
> I did some more experimenting:
>
> 1) Repository size: there is quite some mileage repacking the repos with
> the following steps:
> $ git reflog expire --expire=now --all
> $ git gc --aggressive --prune=now
> $ git repack -ad
> e.g. DOLFIN: 372MiB -> 94MiB
>

Wow. What do these commands do?

> 2) Stripping out the files suggested by Anders
> (https://gist.github.com/alogg/5213171#file-files_to_strip-txt) brings
> the repo size down to 172MiB and 24MiB after repacking.
>

I like this. It will make cloning on slow connection much better.

> 3) I haven't yet found a reliable way to migrate feature branches to
> the filtered repository. Filtering the repository rewrites its history
> and therefore changes/invalidates all commit ids (SHA1s) and therefore
> the marks files created when initially converting the repository.
> There are 2 possible options for filtering the repository during
> conversion:
>
> a) bzr fast-import-filter: seems to be a pain to use with many files
> (need to pass each path individually as an argument) and seems not to
> support writing marks files, therefore haven't tried.
>
> b) git_fast_filter: when using to filter the converted git repo, the
> exported marks file in the last step contains 83932 marks instead of
> the expected 14399 - I can't say why. Unfortunately I haven't been
> able to use it directory in the conversion pipeline, it's not
> compatible to a bzr fast-export stream. That's probably fixable, but I
> can't estimate how much work it would be to fix it since I'm not
> familiar enough with details of the fast-import format.
>
> TL;DR: Repacking repos saves a lot of space already without stripping
> large files. Stripping files is easy to do and saves even considerably
> more space, but I haven't been able to reliably import feature
> branches into a filtered repository.
>

How about we give everyone a periodic within which to merge code on
Launchpad, then we don't worry about features branches and marks in
the conversion? Small changes can always come later in the form of
patches.

Garth

> Florian
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~fenics
> Post to     : fenics@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fenics
> More help   : https://help.launchpad.net/ListHelp
>


Follow ups

References