← Back to team overview

duplicity-team team mailing list archive

Re: Listing old files

 

The sig sizes are not really an issue to workaround, just a thought for
future enhancement, secondary to the question of keeping sig files.

What happens in duplicity is that an incremental is made up of changed
pieces of files, not the entire changed file.  In order to detect
changes we hash files in blocks and store those hashes in a sig file.
For example, a file of 4 blocks would have 4 hashes.  If only one of
those blocks changed, then we would store just that block and the
associated new signature (the 4 hashes).  So, a sig file size increases
with an increase of file sizes.  Simple, but efficient.

...Ken

edgar.soldin@xxxxxx wrote:
> Before going further I'd rather find out what size of sigs we really
> talk about and if this is really an issue to workaround.
> In what sizes do sig files result? How come the differences mentioned
> below?
> 
> Btw. I am very enthusiastic on the matter as it combines closed (e.g.
> the 50st week of 2009) backup chains for archival and a way to list
> their contents, what's currently impossible. Actually this is another
> step to feature completeness I think.
> 
> ... ede
> 
> 
> On 22.10.2009 03:01, Michael Terry wrote:
>> Ken, I'm not feeling much enthusiasm here for this from you.  :)  So
>> as I understand it, these are the suggested options:
>>
>> Solution A: Always keep sig files.  Con: Takes up space.  (Do we have
>> any data for that for big backups?)
>>
>> Solution B: Keep sig files if an option is passed (--keep-sigs?).
>> Cons: There is concern that it could be forgotten and sigs would be
>> accidentally lost.
>>
>> Solution C: Have signatures cover larger blocks.  I don't quite
>> understand this option.
>>
>> What about Solution D?:  Always keep sig files under normal operation,
>> but delete old ones on a cleanup unless --keep-sigs is passed
>> (currently, they are also deleted after a new full backup).  This way,
>> you don't have to keep passing the arg every time (and thus risk
>> forgetting it), but you still have an easy solution for recapturing
>> space.
>>
>> Eh?
>>
>> -mt
>>
>> 2009/10/20<edgar.soldin@xxxxxx>:
>>> On 20.10.2009 22:13, Kenneth Loafman wrote:
>>>>
>>>> edgar.soldin@xxxxxx wrote
>>>>>>
>>>>>> Not sure that keeping the sig files is the way to go as a default
>>>>>> option, and we'd run into the same problems as --archive-dir if we
>>>>>> make
>>>>>> it optional.
>>>>>>
>>>>>
>>>>> currently keeping sig files is no option, or? So it's a whole
>>>>> different
>>>>> scenario, isn't it?
>>>>>
>>>>
>>>> Not really all that, but the root problem with options like this is
>>>> that
>>>> we don't store them in a configuration file for reuse.  If we did, then
>>>> a lot of problems would be solved.
>>>>
>>>
>>> that's why there are frontends to the mighty magic of duplicity
>>> storing the
>>> options. Again .. right now simply enable the keeping of the sig
>>> files would
>>> do no harm. It would take up more space,  while enabling the user to
>>> list
>>> the contents of the chain much faster. Right?
>>> So why does the user keep a chain if he/her doesn't want to use it at
>>> some
>>> point?
>>>
>>> I just looked and found a sigtar for a 4,5TB full 69MB in size. Small
>>> incrementals 40 KB in size seem to have equally sized sigtars. Whats the
>>> logic behind these sizes?
>>>
>>> And if the sig is nearly the same in size as the incremental, why
>>> keep it
>>> seperate? What information is in the sig that couldn't be put in the
>>> data
>>> tar as well. AFAIU the significant advantage is the sigtars size, which
>>> doesn't seem to be true for small incrementals.
>>>
>>>>>> The first run without the option and the previous sigs
>>>>>> would just disappear.  Sure to generate lots of complaints.
>>>>>>
>>>>>
>>>>> what's the downside on keeping the small sig files for each chain?
>>>>>
>>>>
>>>> For one, they aren't that small, and some folks pay a bunch for
>>>> offsite,
>>>> so want as little overhead as possible.  It's all a tradeoff.
>>>>
>>>
>>> it always is .. but again what use are old chains if they are not
>>> listable
>>> performantly and therefore usable at all.
>>>
>>>> One thing to think of is allowing the signatures to cover a larger
>>>> block.  Right now, the blocksize that a sig covers is fairly small,
>>>> matching what rdiff does, but we could make that tuneable.  If you make
>>>> the coverage larger, you get fewer sigs per file and smaller sigtar
>>>> files, but larger blocks are backed up.  The opposite produces larger
>>>> sigtar files.  For now, its a good mix, but it would be better if you
>>>> could tune it with knowledge of your backup needs.
>>>>
>>>
>>> Sorry, I was under the impression that sig files simply hold
>>> checksums for
>>> data vol files and a list of their contents. Is there more about it?
>>> Could
>>> you explain?
>>>
>>>
>>> ...ede
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~duplicity-team
>>> Post to     : duplicity-team@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~duplicity-team
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~duplicity-team
>> Post to     : duplicity-team@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~duplicity-team
>> More help   : https://help.launchpad.net/ListHelp
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~duplicity-team
> Post to     : duplicity-team@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~duplicity-team
> More help   : https://help.launchpad.net/ListHelp
> 



References