duplicity-team team mailing list archive

Thread
Date

Re: All Merged

To: Kenneth Loafman <kenneth@xxxxxxxxxxx>
From: Peter Schuller <peter.schuller@xxxxxxxxxxxx>
Date: Mon, 22 Jun 2009 23:43:22 +0200
Cc: duplicity-team@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4A3FF728.5010500@loafman.com>
User-agent: Mutt/1.5.19 (2009-01-05)

(I think you meant to CC the list? Adding CC to response.)

> As to the archive-dir, it was a nice optimization, but its been a real
> support nightmare.  I don't ever want to do it that way again.  We need
> a directory for persistence when a backup fails, and for any persistent
> data, such as keeping configuration of named backups, etc.

Yes; something like:

  ~/.duplicity/<backup_name>/cache - cache files, removable at any time
  ~/.duplicity/<backup_name>/config - backup profile configs, etc
  ~/.duplicity/<backup_name>/checkpoints - checkpoint info

Or similar.

> You should just be able to use the backend.py ParsedURL and take it from
>  pu.netloc.

I ended up augmenting the backend module to have an is_backend_url
alongside get_backend(), to avoid instanting backends just for the
test. Slightly ugly due to duplicate ParsedUrl creation.

> > The reason I ask is that I realized that most people, even though in
> > reality it's not such a great idea, do backups on live file systems,
> > particularly (for obvious reasons) on platforms where file system
> > snapshots are not trivially attainable. If the results of accidentally
> > doing a restart on a live file system are much worse than a regular
> > pause, it can be considered a bit dangerous to enable
> > checkpoint/restart at all except when explicitly enabled by the user.
> 
> Hmmm, will have to think about that one.  I tend to think of backups
> only when the filesystem is quiescent, but that's just me.

I do to, but in practice I know for a fact most people simply don't do
that. Backing up a live file system is standard practice,
unfortunately. I've had to work pretty hard to convince people it's a
bad idea, even among "techies".

(Case in point, witness the recent "oops we didn't realize a live
tarball of a mysql data directory wasn't safe" disaster(s) of public
site(s)...)

> Part of the problem is that active file systems, especially in Linux,
> don't always tell you that the file underneath has changed, so if you
> backup on an active system, you take your chances.

What rdiff-backup does, if I remember correctly, is to compare mtime
before/after the file was read and rollback if the file was
changed. So, barring strange mtime stuff, you always get either a
consistent copy of that particular file (of course NOT of a directory
tree) or no file at all (in that increment).

But it's definitely a designed-for-case that gets logged.

>  I cannot imagine
> restart causing any problems for the filesystem, but if you're running
> on cron and not checking the logs, then a failure on Monday may lead to
> a restart on Tuesday.

I mostly meant in the sense of what the end results are internally in
duplicity; for example if resumption after modification were to cause
duplicity to emit a broken/corrupt backup because of some kind of
mismatch between already written tar data and the would-be position in
the would-be tar stream being out of synch (again sorry, I haven't
looked at how restart really works exactly).

>  Assuming that completes, you're mostly protected,
> but part of your backup will be from Monday and part from Tuesday.

That part is fine and expected; you can't really demand anything else
if you do this on live file systems...

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@xxxxxxxxxxxx>'
Key retrieval: Send an E-Mail to getpgpkey@xxxxxxxxx
E-Mail: peter.schuller@xxxxxxxxxxxx Web: http://www.scode.org

Attachment: pgphYZSl65I9T.pgp
Description: PGP signature

Follow ups

Re: All Merged
From: Michael Terry, 2009-06-22

References

All Merged
From: Kenneth Loafman, 2009-06-22
Re: All Merged
From: Peter Schuller, 2009-06-22