← Back to team overview

duplicity-team team mailing list archive

[Bug 387102] Re: Asynchronous upload not working properly

 

** Branch linked: lp:duplicity

-- 
Asynchronous upload not working properly
https://bugs.launchpad.net/bugs/387102
You received this bug notification because you are a member of
duplicity-team, which is subscribed to duplicity.

Status in duplicity - Bandwidth Efficient Encrypted Backup: Fix Committed

Bug description:
Duplicity version 0.5.09
Python version 2.6.2
OS: Ubuntu Linux 9.04 (Jaunty)
Filesystem being backed up: ext3
Repeatable: Yes, on my machine (see bottom of bug report for notes)
Log output: Didn't capture any when I did the backup that caused this bug report; will run a new backup and attach -v9 logs later on.


Bug description:

The --asynchronous-upload option isn't working the way I think it should on my system, where the limiting factor is bandwidth. (I.e., it takes a lot longer to upload a 50 megabyte file through my ADSL connection than it does to prepare the next file to be uploaded).


Intended behavior (I assume):

* Volume 1 is prepared.
* Volume 1 finishes preparing, starts uploading.
* While volume 1 is uploaded, volume 2 is prepared.
* Volume 2 finishes preparing; volume 1 is still uploading, so Duplicity waits for the upload to complete.
* Volume 1 finishes uploading; volume 2 immediately starts uploading.
* While volume 2 is uploaded, volume 3 is prepared.

Etc., etc., repeating the last three steps until all volumes are complete. This means that the upload bandwidth is being used almost
constantly; by the time one volume has finished uploading, another volume is ready and waiting in /tmp.


Actual behavior:

I just launched a duplicity backup using the following command (with a real username and server name, of course):

duplicity /home/username/projects scp://username@my.remote.server/backups/projects/ --asynchronous-upload --verbosity 4 --volsize 50

I then opened a WinSCP view to watch the backup upload (since I don't know which verbosity level would give me an "uploaded ### out of ### bytes" display), while doing an "ls -l" of /tmp/duplicity-xyzzyx-tempdir/ so I could watch the files being created. And I noticed the pattern was:

* Volume 1 is prepared.
* Volume 1 finishes preparing, starts uploading.
* While volume 1 is uploaded, volume 2 is prepared.
* Volume 2 finishes preparing; volume 1 is still uploading, so Duplicity waits for the upload to complete.
* Volume 1 finishes uploading; volume 2 immediately starts uploading.
* While volume 2 is uploaded, NOTHING is prepared.
* Volume 2 finishes uploading; now volume 3 is prepared. (Now NOTHING is being uploaded while volume 3 is prepared).
* Volume 3 finishes preparing, starts uploading.
* While volume 3 is uploaded, volume 4 is prepared.
* Volume 4 finishes preparing; volume 3 is still uploading, so Duplicity waits for the upload to complete.
* Volume 3 finishes uploading; volume 4 immediately starts uploading.
* While volume 4 is uploaded, NOTHING is prepared.
* Volume 4 finishes uploading; now volume 5 is prepared. (Now NOTHING is being uploaded while volume 5 is prepared).

Etc., etc., etc., until all volumes are complete.


Notes:

This is not a major bug, but it does mean that the upload bandwidth isn't being used as efficiently as it should be, which is what --asynchronous-upload was meant for. As it currently stands, all that --asynchronous-upload does is that uploads happen in "batches" of 2 volumes rather than "batches" of 1 volume, but uploads still have to wait while the next "batch" is prepared. (Or at least, until the first volume of the next "batch" is prepared).

I'm sure this is repeatable, though the problem won't show up if your upload bandwidth is faster than the preparation of the next volume. If you run this test on an internal gigabyte network, where preparing the next volume is the bottleneck rather than bandwidth being the bottleneck, you probably won't notice any difference. But run a test using an external server on a not-very-fast connection, or throttle your bandwidth somehow so that your upload bandwidth becomes the bottleneck, and you should observe the same pattern I did: volumes being uploaded in "batches" of 2, with a gap in between for the next batch to be prepared.