← Back to team overview

duplicity-team team mailing list archive

Re: Coalescing of Incremental Backups/Snapshots

 

Rob,

Your plan sounds good. Most of the code you will need to look at is in the restore portion of duplicity, most notably the nested iterators that it uses. I'd like to review the algorithm you use once you get it developed. I'm fairly certain Python sets could be used for a lot of the coalescing, but I may be wrong.

I'm also fairly certain that a lot of the code in restore could be re-purposed for coalescing purposes. That would be my first approach.

As one of the efforts in your work, could you add some Epydoc style (http://epydoc.sourceforge.net/index.html) comments to the code as you study. This will help us all to understand it better. I started to do it a while back, but got bogged down in other stuff.

...Thanks,
...Ken

Rob Oakes wrote:
Dear Duplicity Team,

If any of you are following the development of Time Drive, you might be interested to know that we just released a second version (http://www.oak-tree.us/blog/index.php/2009/09/24/timedrive-02). At the same time, the project has a new home on Launchpad (https://launchpad.net/time-drive). In the recently released version of Time Drive, the big push was to clean up the code base and make it easier to customize the settings for individual backup folders, which has largely been complete. I now feel like we have a (relatively) well organized code base that can grow.

Which means it's time to start thinking about future features. One of the big ones that we'd like to see is a better way to manage backups and snapshots, with the combination of incremental backups/snapshots (as discussed a few weeks ago on this list) being integral to that goal.

Which is really the purpose of this email. In the next few weeks, I'm going to have some time to devote to development and would like to spend that time working on the incremental backup combination feature. And because it is always better to share, if that code was written with the hope of eventual inclusion into duplicity, it might save others a bit of work in the future.

Before diving in and generally making a mess, though, I had a couple of questions:

    * What is the overall workflow that this code would need to follow?

          o This is what I've drafted so far: Locate and download .tar
            archives -> extract diff files -> compare signatures to
            availabe diff versions -> discard previous diffs that aren't
            needed to maintain chain continuity ->
            re-compress/re-encrypt to new .tar archive(s) -> Upload to
            storage -> Modify manifest and other files to reflect the
            changes

    * Are there any important steps that I am missing?

    * In the rough workflow above, is there already existing code that
      handles certain components?  Or will all of the code need to be
      written from scratch?

    * Are there existing methods that do similar tasks that might serve
      examples in working with the duplicity classes?  (I've been trying
      to sort my way through restore files methods to better understand
      how to work with individual backup sets and the manifest.)

    * How does one go about modifying the existing manifest so that the
      backup chain is broken beyond all hope of repair?

My apologies for two rather long emails in as many days, but this is a feature that I think would make an excellent addition to both Time Drive and to duplicity, and I've learned that free time should be taken advantage of when it appears.

Cheers,

Rob



Follow ups

References