fenics team mailing list archive
-
fenics team
-
Mailing list archive
-
Message #02013
Re: Cleanup of repositories
On 22/03/13 09:59, Johan Hake wrote:
> On 03/22/2013 10:57 AM, Anders Logg wrote:
>> On Fri, Mar 22, 2013 at 10:52:25AM +0100, Johan Hake wrote:
>>> On 03/22/2013 10:36 AM, Anders Logg wrote:
>>>> On Fri, Mar 22, 2013 at 10:32:50AM +0100, Johan Hake wrote:
>>>>>>
>>>>>>
>>>>>> Not exactly:
>>>>>>
>>>>>> - Meshes in demos --> remove (already done)
>>>>> I suggest we keep these. There aren't any big files
>>>>> anyhow, are there?
>>>>
>>>> They have already been removed and there's a good system in
>>>> place for handling them. Keeping the meshes elsewhere will
>>>> encourage use of the mesh gallery and keeping better track
>>>> of which meshes to use. There were lots of meshes named
>>>> 'mesh.xml' or 'mesh2d.xml' which were really copies of other
>>>> meshes used in other demos, some of them were gzipped, some
>>>> not etc. That's all very clean now. Take a look at how it's
>>>> done in trunk. I think it looks quite nice.
>>>
>>> Nice and clean, but it really is just 30 meshes. Duplications
>>> are mostly related to dolfin_fine.xml.gz, which there are 7
>>> copies of, and that file is 86K.
If they're bit-by-bit identical git will only store a single copy in
the repository anyway, regardless of how many copies you happen to
have in the working tree.
On the note of storing gzipped meshes: Do they change frequently? Why
are they stored gzipped? Compressed files have a few issues:
1) they're treated as binary i.e. any change requires a new copy of
the entire file to be stored
2) they can't be diffed
3) git compresses its packfiles anyway, so there is little (if any)
space gain through compression
>>>> Most of the example meshes are not that big, but multiply
>>>> that by 30 and then some when meshes are moved around or
>>>> renamed.
>>>
>>> I just question if it is worth it. Seems convenient to just
>>> have the meshes there.
>>
>> Keeping the meshes there will put a limit on which demos we can
>> add. I think it would be good to allow for more complex demos
>> requiring bigger meshes (not necessarily run on the buildbot
>> every day).
>
> Ok.
>
>>> If we keep them out of the repo I think we should include some
>>> automagic downloading when building the demos.
>>
>> Yes, or at least a message stating: "You have not downloaded demo
>> data. Please run the script foo."
>>
>>> Also should we rename the script to download-demo-meshes, or
>>> something more descriptive, as this is what that script now
>>> basically does?
>>
>> It is not only meshes, but also markers and velocity fields.
>> Perhaps it can be renamed download-demo-data?
>
> Sounds good.
>
> Johan
I did some more experimenting:
1) Repository size: there is quite some mileage repacking the repos with
the following steps:
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now
$ git repack -ad
e.g. DOLFIN: 372MiB -> 94MiB
2) Stripping out the files suggested by Anders
(https://gist.github.com/alogg/5213171#file-files_to_strip-txt) brings
the repo size down to 172MiB and 24MiB after repacking.
3) I haven't yet found a reliable way to migrate feature branches to
the filtered repository. Filtering the repository rewrites its history
and therefore changes/invalidates all commit ids (SHA1s) and therefore
the marks files created when initially converting the repository.
There are 2 possible options for filtering the repository during
conversion:
a) bzr fast-import-filter: seems to be a pain to use with many files
(need to pass each path individually as an argument) and seems not to
support writing marks files, therefore haven't tried.
b) git_fast_filter: when using to filter the converted git repo, the
exported marks file in the last step contains 83932 marks instead of
the expected 14399 - I can't say why. Unfortunately I haven't been
able to use it directory in the conversion pipeline, it's not
compatible to a bzr fast-export stream. That's probably fixable, but I
can't estimate how much work it would be to fix it since I'm not
familiar enough with details of the fast-import format.
TL;DR: Repacking repos saves a lot of space already without stripping
large files. Stripping files is easy to do and saves even considerably
more space, but I haven't been able to reliably import feature
branches into a filtered repository.
Florian
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
Follow ups
References
-
Re: Cleanup of repositories
From: Anders Logg, 2013-03-21
-
Re: Cleanup of repositories
From: Martin Sandve Alnæs, 2013-03-21
-
Re: Cleanup of repositories
From: Anders Logg, 2013-03-21
-
Re: Cleanup of repositories
From: Florian Rathgeber, 2013-03-22
-
Re: Cleanup of repositories
From: Anders Logg, 2013-03-22
-
Re: Cleanup of repositories
From: Johan Hake, 2013-03-22
-
Re: Cleanup of repositories
From: Anders Logg, 2013-03-22
-
Re: Cleanup of repositories
From: Johan Hake, 2013-03-22
-
Re: Cleanup of repositories
From: Anders Logg, 2013-03-22
-
Re: Cleanup of repositories
From: Johan Hake, 2013-03-22
-
Re: Cleanup of repositories
From: Anders Logg, 2013-03-22
-
Re: Cleanup of repositories
From: Johan Hake, 2013-03-22