duplicity-team team mailing list archive

Thread
Date

Introductions All Round

To: duplicity-team@xxxxxxxxxxxxxxxxxxx, Kenneth Loafman <kenneth@xxxxxxxxxxx>
From: Rob Oakes <lyx-devel@xxxxxxxxxxx>
Date: Sun, 23 Aug 2009 22:50:26 -0600
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.1) Gecko/20090715 Lightning/1.0pre Thunderbird/3.0b3

Dear Duplicity Developers,

As a quick way of introduction, my name is Rob Oakes. I've been lurkingon the mailing list for a while and I thought that it might be a goodidea to come out of hiding and say hi. I am excited to be involved withthe duplicity development efforts and look forward to helping in any wayI can.

As part of this introduction, allow me to explain my interest inDuplicity's development. About a month ago, I started work on a newDuplicity based GUI, called Time-Drive. For those that are curious, theproject Overview can be found at:


http://www.oak-tree.us/blog/index.php/science-and-technology/time-drive

*Current Progress*

At the time that I started writing it, the goals for Time Drive werefairly simple:


  1. I wanted to have an easy to use Linux backup program that would
     allow off-site backup through FTP, SSH, SMB and other protocols.
  2. Backups needed to be automated, regular, and client side encrypted.
  3. I wanted to approximate the "Time Machine" experience of Mac OS X
     as closely as possible.

I had already looked at a variety of other backup programs for Linux(including Flyback and Back In Time; unfortunately, because I didn'tthoroughly research the alternatives, Deja Dup did not make it on myradar) and hadn't found something that met all of my criteria. I foundthis to be rather aggravating since Linux has access to some of the mostpowerful backup utilities in existence (duplicity, rsync, andrdiff-backup just to name just a few). And given the open source natureof the code, I didn't think that it would be much work to extend one ofthe existing programs so that it would better meet my stated goals.

So, with that thought in mind, I started the work of modifying Back InTime to work with a different backend. Because of previous exposure toduplicity (as an offsite backup method for my company's server), I optedto use it for the engine. It was written in Python (a programminglanguage I am moderately comfortable with), and appeared to have amature codebase with ongoing development efforts; which rdiff-backup, mysecond choice did not. After some work, though, it became fairlyobvious that rather than modifying Back In Time to work with duplicity,it would be easier to start from scratch with Back In Time as an"inspiration." And Time Drive was born.

So far, I've been pleased with the overall progress. At present, TimeDrive already fulfills most of my needs. It manages my backup settingsand then feeds that information to duplicity for the actual heavylifting. I've also finished work on an archive browser that allows meto search my archives for files and queue them for restoration. Thisincludes the ability to browse between different "snapshots" of thearchive, based on the time stamp of the incremental backup. Toaccomplish these goals, Time Drive of leverages the existing commandline functionality of duplicity. But in some instances I found itnecessary/more convenient/better to leverage the duplicity classesdirectly. You can find a more comprehensive overview of the state ofdevelopment at:


http://www.oak-tree.us/blog/index.php/2009/08/07/time-drive2

with an update on development at:

http://www.oak-tree.us/blog/index.php/2009/08/14/time-drive3

The actual source code can be found on Launchpad:

https://launchpad.net/time-drive

I understand that the this project is extremely young, but nevertheless,I would love to hear the community's feedback and thoughts.Particularly in regards to the archive browser, since my next majorproject is to port it over as an extension for Nautilus, for Deja-Dup.(But that is a topic for another e-mail.)


*Future Work*

Here, I'd like to solicit your feedback on the code I've already writtenand ask a few questions about the underlying architecture ofDuplicity. The next major feature that I would like to add to TimeDrive is the "smart management" of snapshots, greatly "inspired" fromhow Time Machine on Mac OS X and Back In Time archive their old snapshotdata. For those that might not be aware, both programs keep their dataon a "logarithmic" type scale. In the recent past, this will includeall available data with progressively fewer snapshots the further backin time you go. (It works out to be one snapshot per hour for the pastday, one snapshot per day for the past week, one per week for the pastmonth, and one per month for the past year.) I think it ould bewonderful to implement this same sort of management strategy forDuplicity. But in doing so, I do not want to wipe snapshots required bymore recent incremental backups.

Which leads to the questions below. I am currently stumped on how toproceed. Despite the time I've spent with the code, I am pretty muchignorant of the actual mechanism duplicity uses to creates itsincremental backups.


  1. How does does the process work?  Are newly detected files somehow
     added to the underlying "full" backup?  Do the incremental
     snapshots exist in independence, each containing only the data
     about how files have changed relative to the full backup?  Or do
     the incremental backups exist as part of a chain, where previous
     incremental snapshots are required to restore data form a later
     time point?
  2. It seems that duplicity provides support for deleting snapshots
     before a particular time point and is smart enough to avoid
     purging the data required by the newer incremental backups.  How
     is this done?  Even after spending some time with the
     Collections.py class, I wasn't quite able to wrap my head around
     how the filtering worked.  Are there any ideas about how this same
     sort of filtering could be applied to approximate the effect
     described above?

I'll let off there, since I've gone on longer than I intended.

So far, I've been tremendously impressed with duplicity both as a user,and now as a budding developer. It is truly a fantastic backup program(framework?). Because of its rich feature set, I've been able to createa backup program that nicely meets my needs in a very short amount oftime. I only hope that I will be able to contribute some small amountback. (Hopefully starting with the "smart management" feature.) Basedon the work I've done with Time-Drive, I have a couple of other ideas,but it might be best if I take things one step at a time ;)


Cheers,

Rob Oakes

Follow ups

Re: Introductions All Round
From: Michael Terry, 2009-08-24