duplicity-team team mailing list archive
-
duplicity-team team
-
Mailing list archive
-
Message #00126
Introductions All Round
Dear Duplicity Developers,
As a quick way of introduction, my name is Rob Oakes. I've been lurking
on the mailing list for a while and I thought that it might be a good
idea to come out of hiding and say hi. I am excited to be involved with
the duplicity development efforts and look forward to helping in any way
I can.
As part of this introduction, allow me to explain my interest in
Duplicity's development. About a month ago, I started work on a new
Duplicity based GUI, called Time-Drive. For those that are curious, the
project Overview can be found at:
http://www.oak-tree.us/blog/index.php/science-and-technology/time-drive
*Current Progress*
At the time that I started writing it, the goals for Time Drive were
fairly simple:
1. I wanted to have an easy to use Linux backup program that would
allow off-site backup through FTP, SSH, SMB and other protocols.
2. Backups needed to be automated, regular, and client side encrypted.
3. I wanted to approximate the "Time Machine" experience of Mac OS X
as closely as possible.
I had already looked at a variety of other backup programs for Linux
(including Flyback and Back In Time; unfortunately, because I didn't
thoroughly research the alternatives, Deja Dup did not make it on my
radar) and hadn't found something that met all of my criteria. I found
this to be rather aggravating since Linux has access to some of the most
powerful backup utilities in existence (duplicity, rsync, and
rdiff-backup just to name just a few). And given the open source nature
of the code, I didn't think that it would be much work to extend one of
the existing programs so that it would better meet my stated goals.
So, with that thought in mind, I started the work of modifying Back In
Time to work with a different backend. Because of previous exposure to
duplicity (as an offsite backup method for my company's server), I opted
to use it for the engine. It was written in Python (a programming
language I am moderately comfortable with), and appeared to have a
mature codebase with ongoing development efforts; which rdiff-backup, my
second choice did not. After some work, though, it became fairly
obvious that rather than modifying Back In Time to work with duplicity,
it would be easier to start from scratch with Back In Time as an
"inspiration." And Time Drive was born.
So far, I've been pleased with the overall progress. At present, Time
Drive already fulfills most of my needs. It manages my backup settings
and then feeds that information to duplicity for the actual heavy
lifting. I've also finished work on an archive browser that allows me
to search my archives for files and queue them for restoration. This
includes the ability to browse between different "snapshots" of the
archive, based on the time stamp of the incremental backup. To
accomplish these goals, Time Drive of leverages the existing command
line functionality of duplicity. But in some instances I found it
necessary/more convenient/better to leverage the duplicity classes
directly. You can find a more comprehensive overview of the state of
development at:
http://www.oak-tree.us/blog/index.php/2009/08/07/time-drive2
with an update on development at:
http://www.oak-tree.us/blog/index.php/2009/08/14/time-drive3
The actual source code can be found on Launchpad:
https://launchpad.net/time-drive
I understand that the this project is extremely young, but nevertheless,
I would love to hear the community's feedback and thoughts.
Particularly in regards to the archive browser, since my next major
project is to port it over as an extension for Nautilus, for Deja-Dup.
(But that is a topic for another e-mail.)
*Future Work*
Here, I'd like to solicit your feedback on the code I've already written
and ask a few questions about the underlying architecture of
Duplicity. The next major feature that I would like to add to Time
Drive is the "smart management" of snapshots, greatly "inspired" from
how Time Machine on Mac OS X and Back In Time archive their old snapshot
data. For those that might not be aware, both programs keep their data
on a "logarithmic" type scale. In the recent past, this will include
all available data with progressively fewer snapshots the further back
in time you go. (It works out to be one snapshot per hour for the past
day, one snapshot per day for the past week, one per week for the past
month, and one per month for the past year.) I think it ould be
wonderful to implement this same sort of management strategy for
Duplicity. But in doing so, I do not want to wipe snapshots required by
more recent incremental backups.
Which leads to the questions below. I am currently stumped on how to
proceed. Despite the time I've spent with the code, I am pretty much
ignorant of the actual mechanism duplicity uses to creates its
incremental backups.
1. How does does the process work? Are newly detected files somehow
added to the underlying "full" backup? Do the incremental
snapshots exist in independence, each containing only the data
about how files have changed relative to the full backup? Or do
the incremental backups exist as part of a chain, where previous
incremental snapshots are required to restore data form a later
time point?
2. It seems that duplicity provides support for deleting snapshots
before a particular time point and is smart enough to avoid
purging the data required by the newer incremental backups. How
is this done? Even after spending some time with the
Collections.py class, I wasn't quite able to wrap my head around
how the filtering worked. Are there any ideas about how this same
sort of filtering could be applied to approximate the effect
described above?
I'll let off there, since I've gone on longer than I intended.
So far, I've been tremendously impressed with duplicity both as a user,
and now as a budding developer. It is truly a fantastic backup program
(framework?). Because of its rich feature set, I've been able to create
a backup program that nicely meets my needs in a very short amount of
time. I only hope that I will be able to contribute some small amount
back. (Hopefully starting with the "smart management" feature.) Based
on the work I've done with Time-Drive, I have a couple of other ideas,
but it might be best if I take things one step at a time ;)
Cheers,
Rob Oakes
Follow ups