← Back to team overview

duplicity-team team mailing list archive

[Merge] lp:~carlalex/duplicity/duplicity into lp:duplicity

 

carlalex has proposed merging lp:~carlalex/duplicity/duplicity into lp:duplicity.

Commit message:
Boto3 backend for AWS.

Requested reviews:
  duplicity-team (duplicity-team)

For more details, see:
https://code.launchpad.net/~carlalex/duplicity/duplicity/+merge/376206

Boto3 backend for AWS.

-- 
Your team duplicity-team is requested to review the proposed merge of lp:~carlalex/duplicity/duplicity into lp:duplicity.
=== modified file '.bzrignore'
--- .bzrignore	2019-11-24 17:00:02 +0000
+++ .bzrignore	2019-11-30 21:45:03 +0000
@@ -25,4 +25,8 @@
 testing/gnupg/.gpg-v21-migrated
 testing/gnupg/S.*
 testing/gnupg/private-keys-v1.d
+<<<<<<< TREE
 duplicity/backends/rclonebackend.py
+=======
+duplicity-venv
+>>>>>>> MERGE-SOURCE

=== modified file 'bin/duplicity.1'
--- bin/duplicity.1	2019-05-05 12:16:14 +0000
+++ bin/duplicity.1	2019-11-30 21:45:03 +0000
@@ -706,7 +706,7 @@
 Sets the update rate at which duplicity will output the upload progress
 messages (requires
 .BI --progress
-option). Default is to prompt the status each 3 seconds.
+option). Default is to print the status each 3 seconds.
 
 .TP
 .BI "--rename " "<original path> <new path>"
@@ -730,6 +730,36 @@
 duplicity --rsync-options="--partial-dir=.rsync-partial" /home/me rsync://uid@xxxxxxxxxx/some_dir
 
 .TP
+.BI "--s3-use-boto3"
+When backing up to Amazon S3, use the new boto3 based backend. The boto3
+backend is a rewrite of the older Amazon S3 backend, which was based on the
+now deprecated and unsupported boto library.  This new backend
+fixes known limitations in the older backend, which have crept in as
+Amazon S3 has evolved while the deprecated boto library has not kept up.
+
+The boto3 backend should behave largely the same as the older S3 backend,
+but there are some differences in the handling of some of the "S3" options.
+See the documentation for specific options for differences related to
+each.
+
+The boto3 backend does not support bucket creation.
+This is a deliberate choice which simplifies the code, and side steps
+problems related to region selection.  Additionally, it is probably
+not a good practice to give your backup role bucket creation rights.
+In most cases the role used for backups should probably be
+limited to specific buckets.
+
+The boto3 backend only supports newer domain style buckets.  Amazon is moving
+to deprecate the older bucket style, so migration is recommended.
+Use the older s3 backend for compatibility with backups stored in
+buckets using older naming conventions.
+
+The boto3 backend does not currently support initiating restores
+from the glacier storage class.  When restoring a backup from
+glacier or glacier deep archive, the backup files must first be
+restored out of band.
+
+.TP
 .BI "--s3-european-buckets"
 When using the Amazon S3 backend, create buckets in Europe instead of
 the default (requires
@@ -738,6 +768,9 @@
 .B EUROPEAN S3 BUCKETS
 section.
 
+This option does not apply when using the newer boto3 backend, which
+does not create buckets (see above).
+
 .TP
 .BI "--s3-unencrypted-connection"
 Don't use SSL for connections to S3.
@@ -753,6 +786,8 @@
 increment files.  Unless that is disabled, an observer will not be able to see
 the file names or contents.
 
+This option is not available when using the newer boto3 backend.
+
 .TP
 .BI "--s3-use-new-style"
 When operating on Amazon S3 buckets, use new-style subdomain bucket
@@ -760,6 +795,9 @@
 is not backwards compatible if your bucket name contains upper-case
 characters or other characters that are not valid in a hostname.
 
+This option has no effect when using the newer boto3 backend, which
+will always use new style subdomain bucket naming.
+
 .TP
 .BI "--s3-use-rrs"
 Store volumes using Reduced Redundancy Storage when uploading to Amazon S3.
@@ -796,6 +834,20 @@
 all other data is stored in S3 Glacier.
 
 .TP
+.BI "--s3-use-deep-archive"
+Store volumes using Glacier Deep Archive S3 when uploading to Amazon S3. This storage class
+has a lower cost of storage but a higher per-request cost along with delays
+of up to 12 hours from the time of retrieval request. This storage cost is
+calculated against a 180-day storage minimum. According to Amazon this storage is
+ideal for data archiving and long-term backup offering 99.999999999% durability.
+To restore a backup you will have to manually migrate all data stored on AWS
+Glacier Deep Archive back to Standard S3 and wait for AWS to complete the migration.
+.B Notice:
+Duplicity will store the manifest.gpg files from full and incremental backups on
+AWS S3 standard storage to allow quick retrieval for later incremental backups,
+all other data is stored in S3 Glacier Deep Archive.
+
+.TP
 .BI "--s3-use-multiprocessing"
 Allow multipart volumne uploads to S3 through multiprocessing. This option
 requires Python 2.6 and can be used to make uploads to S3 more efficient.
@@ -803,6 +855,9 @@
 uploaded in parallel. Useful if you want to saturate your bandwidth
 or if large files are failing during upload.
 
+This has no effect when using the newer boto3 backend.  Boto3 always
+attempts to multiprocessing when it is believed it will be more efficient.
+
 .TP
 .BI "--s3-use-server-side-encryption"
 Allow use of server side encryption in S3
@@ -814,6 +869,8 @@
 to maximize the use of your bandwidth. For example, a chunk size of 10MB
 with a volsize of 30MB will result in 3 chunks per volume upload.
 
+This has no effect when using the newer boto3 backend.
+
 .TP
 .BI "--s3-multipart-max-procs"
 Specify the maximum number of processes to spawn when performing a multipart
@@ -822,6 +879,8 @@
 required to ensure you don't overload your system while maximizing the use of
 your bandwidth.
 
+This has no effect when using the newer boto3 backend.
+
 .TP
 .BI "--s3-multipart-max-timeout"
 You can control the maximum time (in seconds) a multipart upload can spend on
@@ -829,6 +888,8 @@
 hanging on multipart uploads or if you'd like to control the time variance
 when uploading to S3 to ensure you kill connections to slow S3 endpoints.
 
+This has no effect when using the newer boto3 backend.
+
 .TP
 .BI "--azure-blob-tier"
 Standard storage tier used for backup files (Hot|Cool|Archive).

=== added file 'duplicity/backends/_boto3backend.py'
--- duplicity/backends/_boto3backend.py	1970-01-01 00:00:00 +0000
+++ duplicity/backends/_boto3backend.py	2019-11-30 21:45:03 +0000
@@ -0,0 +1,200 @@
+# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
+#
+# Copyright 2002 Ben Escoto <ben@xxxxxxxxxxx>
+# Copyright 2007 Kenneth Loafman <kenneth@xxxxxxxxxxx>
+# Copyright 2019 Carl A. Adams <carlalex@xxxxxxxxxxxxx>
+#
+# This file is part of duplicity.
+#
+# Duplicity is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; either version 2 of the License, or (at your
+# option) any later version.
+#
+# Duplicity is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with duplicity; if not, write to the Free Software Foundation,
+# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+import duplicity.backend
+from duplicity import globals
+from duplicity import log
+from duplicity.errors import FatalBackendException, BackendException
+from duplicity import util
+from duplicity import progress
+
+
+# Note: current gaps with the old boto backend include:
+#       - no support for a hostname/port in S3 URL yet.
+#       - Glacier restore to S3 not implemented. Should this
+#         be done here? or is that out of scope. It can take days,
+#         so waiting seems like it's not ideal. "thaw" isn't currently
+#         a generic concept that the core asks of back-ends. Perhaps
+#         that is worth exploring.  The older boto backend appeared
+#         to attempt this restore in the code, but the man page
+#         indicated that restores should be done out of band.
+#         If/when implemented,  We should add the the following new features:
+#              - when restoring from glacier or deep archive, specify TTL.
+#              - allow user to specify how fast to restore (impacts cost).
+
+class BotoBackend(duplicity.backend.Backend):
+    u"""
+    Backend for Amazon's Simple Storage System, (aka Amazon S3), though
+    the use of the boto3 module. (See
+    https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
+    for information on boto3.)
+.
+    Pursuant to Amazon's announced deprecation of path style S3 access,
+    this backend only supports virtual host style bucket URIs.
+    See the man page for full details.
+
+    To make use of this backend, you must provide AWS credentials.
+    This may be done in several ways: through the environment variables
+    AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, by the
+    ~/.aws/credentials file, by the ~/.aws/config file,
+    or by using the boto2 style ~/.boto or /etc/boto.cfg files.
+    """
+
+    def __init__(self, parsed_url):
+        duplicity.backend.Backend.__init__(self, parsed_url)
+
+        # This folds the null prefix and all null parts, which means that:
+        #  //MyBucket/ and //MyBucket are equivalent.
+        #  //MyBucket//My///My/Prefix/ and //MyBucket/My/Prefix are equivalent.
+        url_path_parts = [x for x in parsed_url.path.split(u'/') if x != u'']
+        if url_path_parts:
+            self.bucket_name = url_path_parts.pop(0)
+        else:
+            raise BackendException(u'S3 requires a bucket name.')
+
+        if url_path_parts:
+            self.key_prefix = u'%s/' % u'/'.join(url_path_parts)
+        else:
+            self.key_prefix = u''
+
+        self.parsed_url = parsed_url
+        self.straight_url = duplicity.backend.strip_auth_from_url(parsed_url)
+        self.s3 = None
+        self.bucket = None
+        self.tracker = UploadProgressTracker()
+        self.reset_connection()
+
+    def reset_connection(self):
+        import boto3
+        import botocore
+        from botocore.exceptions import ClientError
+
+        self.bucket = None
+        self.s3 = boto3.resource('s3')
+
+        try:
+            self.s3.meta.client.head_bucket(Bucket=self.bucket_name)
+        except botocore.exceptions.ClientError as bce:
+            error_code = bce.response['Error']['Code']
+            if error_code == '404':
+                raise FatalBackendException(u'S3 bucket "%s" does not exist' % self.bucket_name,
+                                            code=log.ErrorCode.backend_not_found)
+            else:
+                raise
+
+        self.bucket = self.s3.Bucket(self.bucket_name)  # only set if bucket is thought to exist.
+
+    def _put(self, local_source_path, remote_filename):
+        remote_filename = util.fsdecode(remote_filename)
+        key = self.key_prefix + remote_filename
+
+        if globals.s3_use_rrs:
+            storage_class = u'REDUCED_REDUNDANCY'
+        elif globals.s3_use_ia:
+            storage_class = u'STANDARD_IA'
+        elif globals.s3_use_onezone_ia:
+            storage_class = u'ONEZONE_IA'
+        elif globals.s3_use_glacier and u"manifest" not in remote_filename:
+            storage_class = u'GLACIER'
+        elif globals.s3_use_deep_archive and u"manifest" not in remote_filename:
+            storage_class = u'DEEP_ARCHIVE'
+        else:
+            storage_class = u'STANDARD'
+        extra_args = {u'StorageClass': storage_class}
+
+        if globals.s3_use_sse:
+            extra_args[u'ServerSideEncryption'] = u'AES256'
+        elif globals.s3_use_sse_kms:
+            if globals.s3_kms_key_id is None:
+                raise FatalBackendException(u"S3 USE SSE KMS was requested, but key id not provided "
+                                            u"require (--s3-kms-key-id)",
+                                            code=log.ErrorCode.s3_kms_no_id)
+            extra_args[u'ServerSideEncryption'] = u'aws:kms'
+            extra_args[u'SSEKMSKeyId'] = globals.s3_kms_key_id
+            if globals.s3_kms_grant:
+                extra_args[u'GrantFullControl'] = globals.s3_kms_grant
+
+        # Should the tracker be scoped to the put or the backend?
+        # The put seems right to me, but the results look a little more correct
+        # scoped to the backend.  This brings up questions about knowing when
+        # it's proper for it to be reset.
+        # tracker = UploadProgressTracker() # Scope the tracker to the put()
+        tracker = self.tracker
+
+        log.Info(u"Uploading %s/%s to %s Storage" % (self.straight_url, remote_filename, storage_class))
+        self.s3.Object(self.bucket.name, key).upload_file(local_source_path.uc_name,
+                                                          Callback=tracker.progress_cb,
+                                                          ExtraArgs=extra_args)
+
+    def _get(self, remote_filename, local_path):
+        remote_filename = util.fsdecode(remote_filename)
+        key = self.key_prefix + remote_filename
+        self.s3.Object(self.bucket.name, key).download_file(local_path.uc_name)
+
+    def _list(self):
+        filename_list = []
+        for obj in self.bucket.objects.filter(Prefix=self.key_prefix):
+            try:
+                filename = obj.key.replace(self.key_prefix, u'', 1)
+                filename_list.append(filename)
+                log.Debug(u"Listed %s/%s" % (self.straight_url, filename))
+            except AttributeError:
+                pass
+        return filename_list
+
+    def _delete(self, remote_filename):
+        remote_filename = util.fsdecode(remote_filename)
+        key = self.key_prefix + remote_filename
+        self.s3.Object(self.bucket.name, key).delete()
+
+    def _query(self, remote_filename):
+        import botocore
+        from botocore.exceptions import ClientError
+
+        remote_filename = util.fsdecode(remote_filename)
+        key = self.key_prefix + remote_filename
+        content_length = -1
+        try:
+            s3_obj = self.s3.Object(self.bucket.name, key)
+            s3_obj.load()
+            content_length = s3_obj.content_length
+        except botocore.exceptions.ClientError as bce:
+            if bce.response['Error']['Code'] == '404':
+                pass
+            else:
+                raise
+        return {u'size': content_length}
+
+
+class UploadProgressTracker(object):
+    def __init__(self):
+        self.total_bytes = 0
+
+    def progress_cb(self, fresh_byte_count):
+        self.total_bytes += fresh_byte_count
+        progress.report_transfer(self.total_bytes, 0)  # second arg appears to be unused
+        # It would seem to me that summing progress should be the callers job,
+        # and backends should just toss bytes written numbers over the fence.
+        # But, the progress bar doesn't work in a reasonable way when we do
+        # that. (This would also eliminate the need for this class to hold
+        # the scoped rolling total.)
+        # progress.report_transfer(fresh_byte_count, 0)

=== modified file 'duplicity/backends/botobackend.py'
--- duplicity/backends/botobackend.py	2018-07-23 14:55:39 +0000
+++ duplicity/backends/botobackend.py	2019-11-30 21:45:03 +0000
@@ -23,10 +23,14 @@
 import duplicity.backend
 from duplicity import globals
 
-if globals.s3_use_multiprocessing:
-    from ._boto_multi import BotoBackend
+if globals.s3_use_boto3:
+    from ._boto3backend import BotoBackend
 else:
-    from ._boto_single import BotoBackend
+    if globals.s3_use_multiprocessing:
+        from ._boto_multi import BotoBackend
+    else:
+        from ._boto_single import BotoBackend
+        # TODO: if globals.s3_use_boto3
 
 duplicity.backend.register_backend(u"gs", BotoBackend)
 duplicity.backend.register_backend(u"s3", BotoBackend)

=== modified file 'duplicity/commandline.py'
--- duplicity/commandline.py	2019-11-24 17:00:02 +0000
+++ duplicity/commandline.py	2019-11-30 21:45:03 +0000
@@ -506,7 +506,10 @@
     # support european for now).
     parser.add_option(u"--s3-european-buckets", action=u"store_true")
 
-    # Whether to use S3 Reduced Redudancy Storage
+    # Use the boto3 implementation for s3
+    parser.add_option(u"--s3-use-boto3", action=u"store_true")
+
+    # Whether to use S3 Reduced Redundancy Storage
     parser.add_option(u"--s3-use-rrs", action=u"store_true")
 
     # Whether to use S3 Infrequent Access Storage
@@ -515,6 +518,9 @@
     # Whether to use S3 Glacier Storage
     parser.add_option(u"--s3-use-glacier", action=u"store_true")
 
+    # Whether to use S3 Glacier Deep Archive Storage
+    parser.add_option(u"--s3-use-deep-archive", action=u"store_true")
+
     # Whether to use S3 One Zone Infrequent Access Storage
     parser.add_option(u"--s3-use-onezone-ia", action=u"store_true")
 

=== modified file 'duplicity/globals.py'
--- duplicity/globals.py	2019-05-17 16:41:49 +0000
+++ duplicity/globals.py	2019-11-30 21:45:03 +0000
@@ -200,12 +200,20 @@
 # Whether to use S3 Glacier Storage
 s3_use_glacier = False
 
+# Whether to use S3 Glacier Deep Archive Storage
+s3_use_deep_archive = False
+
 # Whether to use S3 One Zone Infrequent Access Storage
 s3_use_onezone_ia = False
 
 # True if we should use boto multiprocessing version
 s3_use_multiprocessing = False
 
+# True if we should use new Boto3 backend. This backend does not
+# support some legacy features, so old back end retained for
+# compatibility with old backups.
+s3_use_boto3 = False
+
 # Chunk size used for S3 multipart uploads.The number of parallel uploads to
 # S3 be given by chunk size / volume size. Use this to maximize the use of
 # your bandwidth. Defaults to 25MB

=== modified file 'requirements.txt'
--- requirements.txt	2019-11-16 17:15:49 +0000
+++ requirements.txt	2019-11-30 21:45:03 +0000
@@ -26,6 +26,7 @@
 # azure
 # b2sdk
 # boto
+# boto3
 # dropbox==6.9.0
 # gdata
 # jottalib


Follow ups