duplicity-team team mailing list archive
-
duplicity-team team
-
Mailing list archive
-
Message #05221
[Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
carlalex has proposed merging lp:~carlalex/duplicity/duplicity into lp:duplicity.
Commit message:
Boto3 backend for AWS.
Requested reviews:
duplicity-team (duplicity-team)
For more details, see:
https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206
Boto3 backend for AWS.
--
Your team duplicity-team is requested to review the proposed merge of lp:~carlalex/duplicity/duplicity into lp:duplicity.
=== modified file '.bzrignore'
--- .bzrignore 2019-11-24 17:00:02 +0000
+++ .bzrignore 2019-11-30 21:45:03 +0000
@@ -25,4 +25,8 @@
testing/gnupg/.gpg-v21-migrated
testing/gnupg/S.*
testing/gnupg/private-keys-v1.d
+<<<<<<< TREE
duplicity/backends/rclonebackend.py
+=======
+duplicity-venv
+>>>>>>> MERGE-SOURCE
=== modified file 'bin/duplicity.1'
--- bin/duplicity.1 2019-05-05 12:16:14 +0000
+++ bin/duplicity.1 2019-11-30 21:45:03 +0000
@@ -706,7 +706,7 @@
Sets the update rate at which duplicity will output the upload progress
messages (requires
.BI --progress
-option). Default is to prompt the status each 3 seconds.
+option). Default is to print the status each 3 seconds.
.TP
.BI "--rename " "<original path> <new path>"
@@ -730,6 +730,36 @@
duplicity --rsync-options="--partial-dir=.rsync-partial" /home/me rsync://uid@xxxxxxxxxx/some_dir
.TP
+.BI "--s3-use-boto3"
+When backing up to Amazon S3, use the new boto3 based backend. The boto3
+backend is a rewrite of the older Amazon S3 backend, which was based on the
+now deprecated and unsupported boto library. This new backend
+fixes known limitations in the older backend, which have crept in as
+Amazon S3 has evolved while the deprecated boto library has not kept up.
+
+The boto3 backend should behave largely the same as the older S3 backend,
+but there are some differences in the handling of some of the "S3" options.
+See the documentation for specific options for differences related to
+each.
+
+The boto3 backend does not support bucket creation.
+This is a deliberate choice which simplifies the code, and side steps
+problems related to region selection. Additionally, it is probably
+not a good practice to give your backup role bucket creation rights.
+In most cases the role used for backups should probably be
+limited to specific buckets.
+
+The boto3 backend only supports newer domain style buckets. Amazon is moving
+to deprecate the older bucket style, so migration is recommended.
+Use the older s3 backend for compatibility with backups stored in
+buckets using older naming conventions.
+
+The boto3 backend does not currently support initiating restores
+from the glacier storage class. When restoring a backup from
+glacier or glacier deep archive, the backup files must first be
+restored out of band.
+
+.TP
.BI "--s3-european-buckets"
When using the Amazon S3 backend, create buckets in Europe instead of
the default (requires
@@ -738,6 +768,9 @@
.B EUROPEAN S3 BUCKETS
section.
+This option does not apply when using the newer boto3 backend, which
+does not create buckets (see above).
+
.TP
.BI "--s3-unencrypted-connection"
Don't use SSL for connections to S3.
@@ -753,6 +786,8 @@
increment files. Unless that is disabled, an observer will not be able to see
the file names or contents.
+This option is not available when using the newer boto3 backend.
+
.TP
.BI "--s3-use-new-style"
When operating on Amazon S3 buckets, use new-style subdomain bucket
@@ -760,6 +795,9 @@
is not backwards compatible if your bucket name contains upper-case
characters or other characters that are not valid in a hostname.
+This option has no effect when using the newer boto3 backend, which
+will always use new style subdomain bucket naming.
+
.TP
.BI "--s3-use-rrs"
Store volumes using Reduced Redundancy Storage when uploading to Amazon S3.
@@ -796,6 +834,20 @@
all other data is stored in S3 Glacier.
.TP
+.BI "--s3-use-deep-archive"
+Store volumes using Glacier Deep Archive S3 when uploading to Amazon S3. This storage class
+has a lower cost of storage but a higher per-request cost along with delays
+of up to 12 hours from the time of retrieval request. This storage cost is
+calculated against a 180-day storage minimum. According to Amazon this storage is
+ideal for data archiving and long-term backup offering 99.999999999% durability.
+To restore a backup you will have to manually migrate all data stored on AWS
+Glacier Deep Archive back to Standard S3 and wait for AWS to complete the migration.
+.B Notice:
+Duplicity will store the manifest.gpg files from full and incremental backups on
+AWS S3 standard storage to allow quick retrieval for later incremental backups,
+all other data is stored in S3 Glacier Deep Archive.
+
+.TP
.BI "--s3-use-multiprocessing"
Allow multipart volumne uploads to S3 through multiprocessing. This option
requires Python 2.6 and can be used to make uploads to S3 more efficient.
@@ -803,6 +855,9 @@
uploaded in parallel. Useful if you want to saturate your bandwidth
or if large files are failing during upload.
+This has no effect when using the newer boto3 backend. Boto3 always
+attempts to multiprocessing when it is believed it will be more efficient.
+
.TP
.BI "--s3-use-server-side-encryption"
Allow use of server side encryption in S3
@@ -814,6 +869,8 @@
to maximize the use of your bandwidth. For example, a chunk size of 10MB
with a volsize of 30MB will result in 3 chunks per volume upload.
+This has no effect when using the newer boto3 backend.
+
.TP
.BI "--s3-multipart-max-procs"
Specify the maximum number of processes to spawn when performing a multipart
@@ -822,6 +879,8 @@
required to ensure you don't overload your system while maximizing the use of
your bandwidth.
+This has no effect when using the newer boto3 backend.
+
.TP
.BI "--s3-multipart-max-timeout"
You can control the maximum time (in seconds) a multipart upload can spend on
@@ -829,6 +888,8 @@
hanging on multipart uploads or if you'd like to control the time variance
when uploading to S3 to ensure you kill connections to slow S3 endpoints.
+This has no effect when using the newer boto3 backend.
+
.TP
.BI "--azure-blob-tier"
Standard storage tier used for backup files (Hot|Cool|Archive).
=== added file 'duplicity/backends/_boto3backend.py'
--- duplicity/backends/_boto3backend.py 1970-01-01 00:00:00 +0000
+++ duplicity/backends/_boto3backend.py 2019-11-30 21:45:03 +0000
@@ -0,0 +1,200 @@
+# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
+#
+# Copyright 2002 Ben Escoto <ben@xxxxxxxxxxx>
+# Copyright 2007 Kenneth Loafman <kenneth@xxxxxxxxxxx>
+# Copyright 2019 Carl A. Adams <carlalex@xxxxxxxxxxxxx>
+#
+# This file is part of duplicity.
+#
+# Duplicity is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; either version 2 of the License, or (at your
+# option) any later version.
+#
+# Duplicity is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with duplicity; if not, write to the Free Software Foundation,
+# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+import duplicity.backend
+from duplicity import globals
+from duplicity import log
+from duplicity.errors import FatalBackendException, BackendException
+from duplicity import util
+from duplicity import progress
+
+
+# Note: current gaps with the old boto backend include:
+# - no support for a hostname/port in S3 URL yet.
+# - Glacier restore to S3 not implemented. Should this
+# be done here? or is that out of scope. It can take days,
+# so waiting seems like it's not ideal. "thaw" isn't currently
+# a generic concept that the core asks of back-ends. Perhaps
+# that is worth exploring. The older boto backend appeared
+# to attempt this restore in the code, but the man page
+# indicated that restores should be done out of band.
+# If/when implemented, We should add the the following new features:
+# - when restoring from glacier or deep archive, specify TTL.
+# - allow user to specify how fast to restore (impacts cost).
+
+class BotoBackend(duplicity.backend.Backend):
+ u"""
+ Backend for Amazon's Simple Storage System, (aka Amazon S3), though
+ the use of the boto3 module. (See
+ https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
+ for information on boto3.)
+.
+ Pursuant to Amazon's announced deprecation of path style S3 access,
+ this backend only supports virtual host style bucket URIs.
+ See the man page for full details.
+
+ To make use of this backend, you must provide AWS credentials.
+ This may be done in several ways: through the environment variables
+ AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, by the
+ ~/.aws/credentials file, by the ~/.aws/config file,
+ or by using the boto2 style ~/.boto or /etc/boto.cfg files.
+ """
+
+ def __init__(self, parsed_url):
+ duplicity.backend.Backend.__init__(self, parsed_url)
+
+ # This folds the null prefix and all null parts, which means that:
+ # //MyBucket/ and //MyBucket are equivalent.
+ # //MyBucket//My///My/Prefix/ and //MyBucket/My/Prefix are equivalent.
+ url_path_parts = [x for x in parsed_url.path.split(u'/') if x != u'']
+ if url_path_parts:
+ self.bucket_name = url_path_parts.pop(0)
+ else:
+ raise BackendException(u'S3 requires a bucket name.')
+
+ if url_path_parts:
+ self.key_prefix = u'%s/' % u'/'.join(url_path_parts)
+ else:
+ self.key_prefix = u''
+
+ self.parsed_url = parsed_url
+ self.straight_url = duplicity.backend.strip_auth_from_url(parsed_url)
+ self.s3 = None
+ self.bucket = None
+ self.tracker = UploadProgressTracker()
+ self.reset_connection()
+
+ def reset_connection(self):
+ import boto3
+ import botocore
+ from botocore.exceptions import ClientError
+
+ self.bucket = None
+ self.s3 = boto3.resource('s3')
+
+ try:
+ self.s3.meta.client.head_bucket(Bucket=self.bucket_name)
+ except botocore.exceptions.ClientError as bce:
+ error_code = bce.response['Error']['Code']
+ if error_code == '404':
+ raise FatalBackendException(u'S3 bucket "%s" does not exist' % self.bucket_name,
+ code=log.ErrorCode.backend_not_found)
+ else:
+ raise
+
+ self.bucket = self.s3.Bucket(self.bucket_name) # only set if bucket is thought to exist.
+
+ def _put(self, local_source_path, remote_filename):
+ remote_filename = util.fsdecode(remote_filename)
+ key = self.key_prefix + remote_filename
+
+ if globals.s3_use_rrs:
+ storage_class = u'REDUCED_REDUNDANCY'
+ elif globals.s3_use_ia:
+ storage_class = u'STANDARD_IA'
+ elif globals.s3_use_onezone_ia:
+ storage_class = u'ONEZONE_IA'
+ elif globals.s3_use_glacier and u"manifest" not in remote_filename:
+ storage_class = u'GLACIER'
+ elif globals.s3_use_deep_archive and u"manifest" not in remote_filename:
+ storage_class = u'DEEP_ARCHIVE'
+ else:
+ storage_class = u'STANDARD'
+ extra_args = {u'StorageClass': storage_class}
+
+ if globals.s3_use_sse:
+ extra_args[u'ServerSideEncryption'] = u'AES256'
+ elif globals.s3_use_sse_kms:
+ if globals.s3_kms_key_id is None:
+ raise FatalBackendException(u"S3 USE SSE KMS was requested, but key id not provided "
+ u"require (--s3-kms-key-id)",
+ code=log.ErrorCode.s3_kms_no_id)
+ extra_args[u'ServerSideEncryption'] = u'aws:kms'
+ extra_args[u'SSEKMSKeyId'] = globals.s3_kms_key_id
+ if globals.s3_kms_grant:
+ extra_args[u'GrantFullControl'] = globals.s3_kms_grant
+
+ # Should the tracker be scoped to the put or the backend?
+ # The put seems right to me, but the results look a little more correct
+ # scoped to the backend. This brings up questions about knowing when
+ # it's proper for it to be reset.
+ # tracker = UploadProgressTracker() # Scope the tracker to the put()
+ tracker = self.tracker
+
+ log.Info(u"Uploading %s/%s to %s Storage" % (self.straight_url, remote_filename, storage_class))
+ self.s3.Object(self.bucket.name, key).upload_file(local_source_path.uc_name,
+ Callback=tracker.progress_cb,
+ ExtraArgs=extra_args)
+
+ def _get(self, remote_filename, local_path):
+ remote_filename = util.fsdecode(remote_filename)
+ key = self.key_prefix + remote_filename
+ self.s3.Object(self.bucket.name, key).download_file(local_path.uc_name)
+
+ def _list(self):
+ filename_list = []
+ for obj in self.bucket.objects.filter(Prefix=self.key_prefix):
+ try:
+ filename = obj.key.replace(self.key_prefix, u'', 1)
+ filename_list.append(filename)
+ log.Debug(u"Listed %s/%s" % (self.straight_url, filename))
+ except AttributeError:
+ pass
+ return filename_list
+
+ def _delete(self, remote_filename):
+ remote_filename = util.fsdecode(remote_filename)
+ key = self.key_prefix + remote_filename
+ self.s3.Object(self.bucket.name, key).delete()
+
+ def _query(self, remote_filename):
+ import botocore
+ from botocore.exceptions import ClientError
+
+ remote_filename = util.fsdecode(remote_filename)
+ key = self.key_prefix + remote_filename
+ content_length = -1
+ try:
+ s3_obj = self.s3.Object(self.bucket.name, key)
+ s3_obj.load()
+ content_length = s3_obj.content_length
+ except botocore.exceptions.ClientError as bce:
+ if bce.response['Error']['Code'] == '404':
+ pass
+ else:
+ raise
+ return {u'size': content_length}
+
+
+class UploadProgressTracker(object):
+ def __init__(self):
+ self.total_bytes = 0
+
+ def progress_cb(self, fresh_byte_count):
+ self.total_bytes += fresh_byte_count
+ progress.report_transfer(self.total_bytes, 0) # second arg appears to be unused
+ # It would seem to me that summing progress should be the callers job,
+ # and backends should just toss bytes written numbers over the fence.
+ # But, the progress bar doesn't work in a reasonable way when we do
+ # that. (This would also eliminate the need for this class to hold
+ # the scoped rolling total.)
+ # progress.report_transfer(fresh_byte_count, 0)
=== modified file 'duplicity/backends/botobackend.py'
--- duplicity/backends/botobackend.py 2018-07-23 14:55:39 +0000
+++ duplicity/backends/botobackend.py 2019-11-30 21:45:03 +0000
@@ -23,10 +23,14 @@
import duplicity.backend
from duplicity import globals
-if globals.s3_use_multiprocessing:
- from ._boto_multi import BotoBackend
+if globals.s3_use_boto3:
+ from ._boto3backend import BotoBackend
else:
- from ._boto_single import BotoBackend
+ if globals.s3_use_multiprocessing:
+ from ._boto_multi import BotoBackend
+ else:
+ from ._boto_single import BotoBackend
+ # TODO: if globals.s3_use_boto3
duplicity.backend.register_backend(u"gs", BotoBackend)
duplicity.backend.register_backend(u"s3", BotoBackend)
=== modified file 'duplicity/commandline.py'
--- duplicity/commandline.py 2019-11-24 17:00:02 +0000
+++ duplicity/commandline.py 2019-11-30 21:45:03 +0000
@@ -506,7 +506,10 @@
# support european for now).
parser.add_option(u"--s3-european-buckets", action=u"store_true")
- # Whether to use S3 Reduced Redudancy Storage
+ # Use the boto3 implementation for s3
+ parser.add_option(u"--s3-use-boto3", action=u"store_true")
+
+ # Whether to use S3 Reduced Redundancy Storage
parser.add_option(u"--s3-use-rrs", action=u"store_true")
# Whether to use S3 Infrequent Access Storage
@@ -515,6 +518,9 @@
# Whether to use S3 Glacier Storage
parser.add_option(u"--s3-use-glacier", action=u"store_true")
+ # Whether to use S3 Glacier Deep Archive Storage
+ parser.add_option(u"--s3-use-deep-archive", action=u"store_true")
+
# Whether to use S3 One Zone Infrequent Access Storage
parser.add_option(u"--s3-use-onezone-ia", action=u"store_true")
=== modified file 'duplicity/globals.py'
--- duplicity/globals.py 2019-05-17 16:41:49 +0000
+++ duplicity/globals.py 2019-11-30 21:45:03 +0000
@@ -200,12 +200,20 @@
# Whether to use S3 Glacier Storage
s3_use_glacier = False
+# Whether to use S3 Glacier Deep Archive Storage
+s3_use_deep_archive = False
+
# Whether to use S3 One Zone Infrequent Access Storage
s3_use_onezone_ia = False
# True if we should use boto multiprocessing version
s3_use_multiprocessing = False
+# True if we should use new Boto3 backend. This backend does not
+# support some legacy features, so old back end retained for
+# compatibility with old backups.
+s3_use_boto3 = False
+
# Chunk size used for S3 multipart uploads.The number of parallel uploads to
# S3 be given by chunk size / volume size. Use this to maximize the use of
# your bandwidth. Defaults to 25MB
=== modified file 'requirements.txt'
--- requirements.txt 2019-11-16 17:15:49 +0000
+++ requirements.txt 2019-11-30 21:45:03 +0000
@@ -26,6 +26,7 @@
# azure
# b2sdk
# boto
+# boto3
# dropbox==6.9.0
# gdata
# jottalib
Follow ups
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: Carl A. Adams, 2019-12-09
-
[Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: noreply, 2019-12-05
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: edso, 2019-12-04
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: Kenneth Loafman, 2019-12-04
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: Carl A. Adams, 2019-12-04
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: edso, 2019-12-03
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: Kenneth Loafman, 2019-12-02
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: carlalex, 2019-12-01
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: carlalex, 2019-12-01
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: carlalex, 2019-12-01
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: edso, 2019-12-01
-
Re: [Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity
From: edso, 2019-12-01