← Back to team overview

fenics team mailing list archive

Re: Cleanup of repositories

 

On 25/03/13 10:50, Garth N. Wells wrote:
> On 25 March 2013 08:31, Florian Rathgeber
> <florian.rathgeber@xxxxxxxxx> wrote:
>> On 22/03/13 09:59, Johan Hake wrote:
>>> On 03/22/2013 10:57 AM, Anders Logg wrote:
>>>> On Fri, Mar 22, 2013 at 10:52:25AM +0100, Johan Hake wrote:
>>>>> On 03/22/2013 10:36 AM, Anders Logg wrote:
>>>>>> On Fri, Mar 22, 2013 at 10:32:50AM +0100, Johan Hake
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Not exactly:
>>>>>>>> 
>>>>>>>> - Meshes in demos --> remove (already done)
>>>>>>> I suggest we keep these. There aren't any big files 
>>>>>>> anyhow, are there?
>>>>>> 
>>>>>> They have already been removed and there's a good system
>>>>>> in place for handling them. Keeping the meshes elsewhere
>>>>>> will encourage use of the mesh gallery and keeping better
>>>>>> track of which meshes to use. There were lots of meshes
>>>>>> named 'mesh.xml' or 'mesh2d.xml' which were really copies
>>>>>> of other meshes used in other demos, some of them were
>>>>>> gzipped, some not etc. That's all very clean now. Take a
>>>>>> look at how it's done in trunk. I think it looks quite
>>>>>> nice.
>>>>> 
>>>>> Nice and clean, but it really is just 30 meshes.
>>>>> Duplications are mostly related to dolfin_fine.xml.gz,
>>>>> which there are 7 copies of, and that file is 86K.
>> 
>> If they're bit-by-bit identical git will only store a single copy
>> in the repository anyway, regardless of how many copies you
>> happen to have in the working tree.
> 
> Clever.
> 
>> On the note of storing gzipped meshes: Do they change
>> frequently?
> 
> No.
> 
>> Why are they stored gzipped?
> 
> Habit. It's not good for version control.

With a bit of trickery we might even be able to convert all those
gzipped meshes i.e. unzip them in each revision and add only keep the
xml in the repo (retrospectively for the entire history).

>> Compressed files have a few issues: 1) they're treated as binary
>> i.e. any change requires a new copy of the entire file to be
>> stored 2) they can't be diffed 3) git compresses its packfiles
>> anyway, so there is little (if any) space gain through
>> compression
>> 
>>>>>> Most of the example meshes are not that big, but
>>>>>> multiply that by 30 and then some when meshes are moved
>>>>>> around or renamed.
>>>>> 
>>>>> I just question if it is worth it. Seems convenient to
>>>>> just have the meshes there.
>>>> 
>>>> Keeping the meshes there will put a limit on which demos we
>>>> can add. I think it would be good to allow for more complex
>>>> demos requiring bigger meshes (not necessarily run on the
>>>> buildbot every day).
>>> 
>>> Ok.
>>> 
>>>>> If we keep them out of the repo I think we should include
>>>>> some automagic downloading when building the demos.
>>>> 
>>>> Yes, or at least a message stating: "You have not downloaded
>>>> demo data. Please run the script foo."
>>>> 
>>>>> Also should we rename the script to download-demo-meshes,
>>>>> or something more descriptive, as this is what that script
>>>>> now basically does?
>>>> 
>>>> It is not only meshes, but also markers and velocity fields. 
>>>> Perhaps it can be renamed download-demo-data?
>>> 
>>> Sounds good.
>>> 
>>> Johan
>> 
>> I did some more experimenting:
>> 
>> 1) Repository size: there is quite some mileage repacking the
>> repos with the following steps: $ git reflog expire --expire=now
>> --all

git keeps track of how branch HEADs move and does not garbage collect
these revision. This information is kept for 90 days by default. Tell
git to clear this history and "release" if for garbage collection.

>> $ git gc --aggressive --prune=now

Invoke git's garbage collection and tell it to aggressively remove all
objects from packfiles which are no longer reachable in the DAG.

>> $ git repack -ad

Rewrite the packfiles and remove all redundant packs.

>> e.g. DOLFIN: 372MiB -> 94MiB
> 
> Wow. What do these commands do?
> 
>> 2) Stripping out the files suggested by Anders 
>> (https://gist.github.com/alogg/5213171#file-files_to_strip-txt)
>> brings the repo size down to 172MiB and 24MiB after repacking.
> 
> I like this. It will make cloning on slow connection much better.
> 
>> 3) I haven't yet found a reliable way to migrate feature branches
>> to the filtered repository. Filtering the repository rewrites its
>> history and therefore changes/invalidates all commit ids (SHA1s)
>> and therefore the marks files created when initially converting
>> the repository. There are 2 possible options for filtering the
>> repository during conversion:
>> 
>> a) bzr fast-import-filter: seems to be a pain to use with many
>> files (need to pass each path individually as an argument) and
>> seems not to support writing marks files, therefore haven't
>> tried.
>> 
>> b) git_fast_filter: when using to filter the converted git repo,
>> the exported marks file in the last step contains 83932 marks
>> instead of the expected 14399 - I can't say why. Unfortunately I
>> haven't been able to use it directory in the conversion pipeline,
>> it's not compatible to a bzr fast-export stream. That's probably
>> fixable, but I can't estimate how much work it would be to fix it
>> since I'm not familiar enough with details of the fast-import
>> format.
>> 
>> TL;DR: Repacking repos saves a lot of space already without
>> stripping large files. Stripping files is easy to do and saves
>> even considerably more space, but I haven't been able to reliably
>> import feature branches into a filtered repository.
> 
> How about we give everyone a periodic within which to merge code
> on Launchpad, then we don't worry about features branches and marks
> in the conversion? Small changes can always come later in the form
> of patches.

Yes, that's an option. Git has very good support for importing patch
series, maybe bzr can export patch series in the git am format. The
other alternative is importing the feature branch into the
non-filtered git repository and transplant it to the filtered one via
interactive rebase. It's just a bit more work than what I would have
hoped for.

Florian

> Garth
> 
>> Florian

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


Follow ups

References