← Back to team overview

duplicity-team team mailing list archive

Introductions All Round

 

Dear Duplicity Developers,

As a quick way of introduction, my name is Rob Oakes. I've been lurking on the mailing list for a while and I thought that it might be a good idea to come out of hiding and say hi. I am excited to be involved with the duplicity development efforts and look forward to helping in any way I can.

As part of this introduction, allow me to explain my interest in Duplicity's development. About a month ago, I started work on a new Duplicity based GUI, called Time-Drive. For those that are curious, the project Overview can be found at:

http://www.oak-tree.us/blog/index.php/science-and-technology/time-drive

*Current Progress*

At the time that I started writing it, the goals for Time Drive were fairly simple:

  1. I wanted to have an easy to use Linux backup program that would
     allow off-site backup through FTP, SSH, SMB and other protocols.
  2. Backups needed to be automated, regular, and client side encrypted.
  3. I wanted to approximate the "Time Machine" experience of Mac OS X
     as closely as possible.

I had already looked at a variety of other backup programs for Linux (including Flyback and Back In Time; unfortunately, because I didn't thoroughly research the alternatives, Deja Dup did not make it on my radar) and hadn't found something that met all of my criteria. I found this to be rather aggravating since Linux has access to some of the most powerful backup utilities in existence (duplicity, rsync, and rdiff-backup just to name just a few). And given the open source nature of the code, I didn't think that it would be much work to extend one of the existing programs so that it would better meet my stated goals.

So, with that thought in mind, I started the work of modifying Back In Time to work with a different backend. Because of previous exposure to duplicity (as an offsite backup method for my company's server), I opted to use it for the engine. It was written in Python (a programming language I am moderately comfortable with), and appeared to have a mature codebase with ongoing development efforts; which rdiff-backup, my second choice did not. After some work, though, it became fairly obvious that rather than modifying Back In Time to work with duplicity, it would be easier to start from scratch with Back In Time as an "inspiration." And Time Drive was born.

So far, I've been pleased with the overall progress. At present, Time Drive already fulfills most of my needs. It manages my backup settings and then feeds that information to duplicity for the actual heavy lifting. I've also finished work on an archive browser that allows me to search my archives for files and queue them for restoration. This includes the ability to browse between different "snapshots" of the archive, based on the time stamp of the incremental backup. To accomplish these goals, Time Drive of leverages the existing command line functionality of duplicity. But in some instances I found it necessary/more convenient/better to leverage the duplicity classes directly. You can find a more comprehensive overview of the state of development at:

http://www.oak-tree.us/blog/index.php/2009/08/07/time-drive2

with an update on development at:

http://www.oak-tree.us/blog/index.php/2009/08/14/time-drive3

The actual source code can be found on Launchpad:

https://launchpad.net/time-drive

I understand that the this project is extremely young, but nevertheless, I would love to hear the community's feedback and thoughts. Particularly in regards to the archive browser, since my next major project is to port it over as an extension for Nautilus, for Deja-Dup. (But that is a topic for another e-mail.)

*Future Work*

Here, I'd like to solicit your feedback on the code I've already written and ask a few questions about the underlying architecture of Duplicity. The next major feature that I would like to add to Time Drive is the "smart management" of snapshots, greatly "inspired" from how Time Machine on Mac OS X and Back In Time archive their old snapshot data. For those that might not be aware, both programs keep their data on a "logarithmic" type scale. In the recent past, this will include all available data with progressively fewer snapshots the further back in time you go. (It works out to be one snapshot per hour for the past day, one snapshot per day for the past week, one per week for the past month, and one per month for the past year.) I think it ould be wonderful to implement this same sort of management strategy for Duplicity. But in doing so, I do not want to wipe snapshots required by more recent incremental backups.

Which leads to the questions below. I am currently stumped on how to proceed. Despite the time I've spent with the code, I am pretty much ignorant of the actual mechanism duplicity uses to creates its incremental backups.

  1. How does does the process work?  Are newly detected files somehow
     added to the underlying "full" backup?  Do the incremental
     snapshots exist in independence, each containing only the data
     about how files have changed relative to the full backup?  Or do
     the incremental backups exist as part of a chain, where previous
     incremental snapshots are required to restore data form a later
     time point?
  2. It seems that duplicity provides support for deleting snapshots
     before a particular time point and is smart enough to avoid
     purging the data required by the newer incremental backups.  How
     is this done?  Even after spending some time with the
     Collections.py class, I wasn't quite able to wrap my head around
     how the filtering worked.  Are there any ideas about how this same
     sort of filtering could be applied to approximate the effect
     described above?

I'll let off there, since I've gone on longer than I intended.

So far, I've been tremendously impressed with duplicity both as a user, and now as a budding developer. It is truly a fantastic backup program (framework?). Because of its rich feature set, I've been able to create a backup program that nicely meets my needs in a very short amount of time. I only hope that I will be able to contribute some small amount back. (Hopefully starting with the "smart management" feature.) Based on the work I've done with Time-Drive, I have a couple of other ideas, but it might be best if I take things one step at a time ;)

Cheers,

Rob Oakes

Follow ups