← Back to team overview

duplicity-team team mailing list archive

Re: [Question #248111]: Duplicity Adds the same file again after full backup

 

Question #248111 on Duplicity changed:
https://answers.launchpad.net/duplicity/+question/248111

    Status: Open => Answered

edso proposed the following answer:
On 07.05.2014 08:46, Rodrigo Alvarez wrote:
> Question #248111 on Duplicity changed:
> https://answers.launchpad.net/duplicity/+question/248111
> 
>     Status: Answered => Open
> 
> Rodrigo Alvarez is still having a problem:
> Hi Edso,
> 
> Thanks for your reply.  Indeed the behavior you describe is what I
> expect, so I took your advice and ran with -v9 in an attempt to
> reproduce the anomaly with greater visibility.   Overall, it seems that
> although on the 1st pass duplicity states that it added a set of files
> (~53000), during the second pass it compares them against "None", as if
> it had not really added them.   Details of what I did and what I
> observed follow.
> 
> I clear the backend of all duplicity volumes and signature files
> 
> 1st Pass
> I start a new backup of my ~500GB photo library.  Duplicity recognizes there are no previous backups and starts adding all files. Duplicity runs for about ~14 hrs and finishes with the following statistics: 
> --------------[ Backup Statistics ]--------------
> StartTime 1399264740.37 (Sun May  4 21:39:00 2014)
> EndTime 1399315243.63 (Mon May  5 11:40:43 2014)
> ElapsedTime 50503.26 (14 hours 1 minute 43.26 seconds)
> SourceFiles 146166
> SourceFileSize 534825699131 (498 GB)
> NewFiles 146166
> NewFileSize 534825699131 (498 GB)
> DeletedFiles 0
> ChangedFiles 0
> ChangedFileSize 0 (0 bytes)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 146166
> RawDeltaSize 534812567355 (498 GB)
> TotalDestinationSizeChange 517724175942 (482 GB)
> Errors 0
> -------------------------------------------------
> 
> 2nd Pass
> I rerun duplicity, now expecting no incremental data to be added because the source files have not changed, yet in the end duplicity returns the following statistics indicating a new ~183 GB. I see this as a set of incremental volumes on the backend. 
> 
> --------------[ Backup Statistics ]--------------
> StartTime 1399362663.27 (Tue May  6 00:51:03 2014)
> EndTime 1399378166.41 (Tue May  6 05:09:26 2014)
> ElapsedTime 15503.14 (4 hours 18 minutes 23.14 seconds)
> SourceFiles 146166
> SourceFileSize 534825699131 (498 GB)
> NewFiles 53631
> NewFileSize 196565375491 (183 GB)
> DeletedFiles 0
> ChangedFiles 4
> ChangedFileSize 20463616 (19.5 MB)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 53635
> RawDeltaSize 196559616928 (183 GB)
> TotalDestinationSizeChange 189125153141 (176 GB)
> Errors 0
> 
> 3rd pass
> I rerun duplicity one 3rd time, and this time I see the behavior I expected on the second pass, which is not to add any incremental. 
> --------------[ Backup Statistics ]--------------
> StartTime 1399381492.80 (Tue May  6 06:04:52 2014)
> EndTime 1399381602.38 (Tue May  6 06:06:42 2014)
> ElapsedTime 109.58 (1 minute 49.58 seconds)
> SourceFiles 146166
> SourceFileSize 534825699131 (498 GB)
> NewFiles 0
> NewFileSize 0 (0 bytes)
> DeletedFiles 0
> ChangedFiles 4
> ChangedFileSize 20463616 (19.5 MB)
> ChangedDeltaSize 0 (0 bytes)
> DeltaEntries 4
> RawDeltaSize 72022 (70.3 KB)
> TotalDestinationSizeChange 4978 (4.86 KB)
> Errors 0
> 
> Probe 2nd pass logs to see what was added
> Now I analyze the log files (written with -v9) to see which files were added on  the 2nd pass. 
> $ pass1log=duplicity_home_to_primary_bakup_2014_05_04-213859.log
> $ pass2log=duplicity_home_to_primary_bakup_2014_05_06-005101.log
> $ pass3log=duplicity_home_to_primary_bakup_2014_05_06-060450.log
> $ cat  ${pass2log} | grep "^A " |  head
> A Masters/Rolls/2010/10/_MG_0246.CR2
> A Masters/Rolls/2010/10/_MG_0247.CR2
> A Masters/Rolls/2010/10/_MG_0248.CR2
> A Masters/Rolls/2010/10/_MG_0249.CR2
> … and another some 53631 more files…
> 
> Probe 1st pass logs to see if re-added files where there
> I probe the 1st pass log, focusing on the first file that was added on the second pass (even though it was there on the 1st pass).  On the first pass I see that this file was Selected, Compared to None, Got Delta,  Added and that the volume was written. 
> $ fileOfInterest="Masters/Rolls/2010/10/_MG_0246.CR2"
> $ grep -A 400 "${fileOfInterest}" ${pass1log}
> Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
> Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and None
> Getting delta of (('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2 reg) and None
> A Masters/Rolls/2010/10/_MG_0246.CR2
> …. A few lines lower I see the volume was sent to be written.
> AsyncScheduler: scheduling task for asynchronous execution
> AsyncScheduler: active workers = 1
> Processed volume 1254
> Writing /media/primary_backup/raid_bkup/Photos/duplicity-full.20140505T043900Z.vol1254.difftar.gpg
> Registering (mktemp) temporary file /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-wHBzYl-1256
> … and a few lines later I see the write succeeded. 
> Deleting /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-cK6Ccl-1255
> Forgetting temporary file /home/Backups/.duplicity_tmp/duplicity-ATyk1O-tempdir/mktemp-cK6Ccl-1255
> AsyncScheduler: task execution done (success: True)
> AsyncScheduler: active workers = 0
> 
> Probe 2nd pass logs to see why files where re-added
> I probe the 2nd pass log the same way and I see that the same file, was Selected, Compared to NONE <- This doesn't make sense! Added, etc.    It seems that for some reason, duplicity thinks this file was NOT added on the 1st pass. 
> $ grep -A 5 "${fileOfInterest}" ${pass2log}
> Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
> Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and None
> Getting delta of (('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2 reg) and None
> A Masters/Rolls/2010/10/_MG_0246.CR2
> 
> Probe 3rd pass log to see why files were not re-added (as expected). 
> Finally, I probe the 3rd pass log for the same file and I see that the file was Selected and Compared to its previous version. 
> grep -A 5 "${fileOfInterest}" ${pass3log}
> Selecting /home/Photos/Masters/Rolls/2010/10/_MG_0246.CR2
> Comparing ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2') and ('Masters', 'Rolls', '2010', '10', '_MG_0246.CR2')
> 
> It appears that for some reason duplicity did not add this file (or the
> other +53000) thousand files to its record on the first pass (although
> it did add data for this.)    Can you think of any reason why this would
> be the case?   I'm running duplicity as root, could this be permission
> issues?
> 
> As you will note, it is somewhat unsatisfying that the 2nd backup
> consumes an unnecessary ~183GB of space.   I've uploaded the full logs
> for the three runs to
> http://sira.duckdns.org/temp//duplicity_logs_photos_bakup_2014_05.tar.gz
> in case you want to probe them deeper.
> 
> Thank you for your consideration!
> 

aren't you polite :)

ok.. here're my thoughts. first check your backend backup file sizes.
the second go should not be sized 183GB but around 19.5MB. the second go
status states

NewFileSize 196565375491 (183 GB)
...
ChangedFiles 4
ChangedFileSize 20463616 (19.5 MB)

which essentially means that there were 4 changed files that together
have the size of 183GB of that were 19.5MB changed and only this change
is going to be backed up (google "rsync rolling checksum").

on the other hand the second log also has lots of entries like

Selecting path/to/file.name
Comparing ('path', 'to', 'file.name') and None
Getting delta of (('path', 'to', 'file.name') /path/to/file.name reg) and None
A path/to/file.name

adding files via A is totally normal and simply means the file was not deleted.
comparing against none however means it couldn't find the file in the backup.

at this point:

update to latest release 0.6.23 or wait for soonish 0.6.24 and see if
the error persists. no point trying to figure out a bug that might be
solved since two years ago.

..ede/duply

-- 
You received this question notification because you are a member of
duplicity-team, which is an answer contact for Duplicity.