← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~cjwatson/launchpad:archive-file-history into launchpad:master

 

Colin Watson has proposed merging ~cjwatson/launchpad:archive-file-history into launchpad:master.

Commit message:
Turn ArchiveFile into a history table

This adds date_created and date_superseded columns.  Adjust the
publisher to match.

LP: #1765933

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #1765933 in Launchpad itself: "Allow building livefses against a view of the archive at a fixed point in time"
  https://bugs.launchpad.net/launchpad/+bug/1765933

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/373971

The main complexity here is in the changed publisher logic, especially for reprieving (that is, the situation where file contents that were scheduled for deletion become live again, particularly common for empty files).  We previously did this by simply clearing scheduled_deletion_date on the old ArchiveFile row, but that strategy no longer works when we're trying to maintain history: instead, we need to create new rows in such cases.  As a result of the logic changes here, we no longer need the only_condemned=True option in ArchiveFileSet.getByArchive.

I think the publisher tests are now somewhat clearer, since they now explicitly test creation dates, making the chain of events more obvious.

ArchiveFile.superseded_at is arguably redundant with ArchiveFile.scheduled_deletion_date, but I think keeping both is a good idea for clarity, and in case we ever change the stay of execution in future.

We'll need to backfill the new columns.  I'll probably make a separate branch with a garbo job for this: my plan is to set date_created to some arbitrary value (probably just the epoch, so that it's clear that it's arbitrary), and to set date_superseded to scheduled_deletion_date minus the stay of execution for rows that have a scheduled_deletion_date.

This is essentially the same as https://code.launchpad.net/~cjwatson/launchpad/archive-file-history/+merge/343752, converted to git and rebased on master.  https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/373970 has the corresponding DB patch.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:archive-file-history into launchpad:master.
diff --git a/lib/lp/archivepublisher/publishing.py b/lib/lp/archivepublisher/publishing.py
index 523c51c..0153e80 100644
--- a/lib/lp/archivepublisher/publishing.py
+++ b/lib/lp/archivepublisher/publishing.py
@@ -1069,8 +1069,7 @@ class Publisher(object):
             assert path.startswith("dists/")
             return path[len("dists/"):]
 
-        # Gather information on entries in the current Release file, and
-        # make sure nothing there is condemned.
+        # Gather information on entries in the current Release file.
         current_files = {}
         for current_entry in (
                 release_data["SHA256"] + extra_data.get("SHA256", [])):
@@ -1079,33 +1078,54 @@ class Publisher(object):
             real_path = os.path.join(suite_dir, real_name)
             current_files[path] = (
                 int(current_entry["size"]), current_entry["sha256"], real_path)
+
+        # Gather information on entries currently in the database.  Ensure
+        # that we know about all the relevant by-hash directory trees before
+        # doing any removals so that we can prune them properly later, and
+        # work out which condemned files should be reprieved due to the
+        # paths in question having their previous content again.
+        reprieved_files = defaultdict(list)
         uncondemned_files = set()
         for db_file in archive_file_set.getByArchive(
-                self.archive, container=container, only_condemned=True,
-                eager_load=True):
-            stripped_path = strip_dists(db_file.path)
-            if stripped_path in current_files:
-                current_sha256 = current_files[stripped_path][1]
-                if db_file.library_file.content.sha256 == current_sha256:
-                    uncondemned_files.add(db_file)
-        if uncondemned_files:
-            for container, path, sha256 in archive_file_set.unscheduleDeletion(
-                    uncondemned_files):
+                self.archive, container=container, eager_load=True):
+            by_hashes.registerChild(os.path.dirname(strip_dists(db_file.path)))
+            file_key = (db_file.path, db_file.library_file.content.sha256)
+            if db_file.scheduled_deletion_date is None:
+                uncondemned_files.add(file_key)
+            else:
+                stripped_path = strip_dists(db_file.path)
+                if stripped_path in current_files:
+                    current_sha256 = current_files[stripped_path][1]
+                    if db_file.library_file.content.sha256 == current_sha256:
+                        reprieved_files[file_key].append(db_file)
+
+        # We may already have uncondemned entries with the same path and
+        # content as condemned entries that we were about to reprieve; if
+        # so, there's no need to reprieve them.
+        for file_key in uncondemned_files:
+            reprieved_files.pop(file_key, None)
+
+        # Make sure nothing in the current Release file is condemned.
+        if reprieved_files:
+            reprieved_files_flat = set(
+                chain.from_iterable(reprieved_files.values()))
+            archive_file_set.unscheduleDeletion(reprieved_files_flat)
+            for db_file in reprieved_files_flat:
                 self.log.debug(
                     "by-hash: Unscheduled %s for %s in %s for deletion" % (
-                        sha256, path, container))
+                        db_file.library_file.content.sha256, db_file.path,
+                        db_file.container))
 
         # Remove any condemned files from the database whose stay of
         # execution has elapsed.  We ensure that we know about all the
         # relevant by-hash directory trees before doing any removals so that
         # we can prune them properly later.
-        for db_file in archive_file_set.getByArchive(
-                self.archive, container=container):
-            by_hashes.registerChild(os.path.dirname(strip_dists(db_file.path)))
         for container, path, sha256 in archive_file_set.reap(
                 self.archive, container=container):
-            self.log.debug(
-                "by-hash: Deleted %s for %s in %s" % (sha256, path, container))
+            if (path, sha256) not in uncondemned_files:
+                self.log.debug(
+                    "by-hash: Deleted %s for %s in %s" %
+                    (sha256, path, container))
 
         # Ensure that all files recorded in the database are in by-hash.
         db_files = archive_file_set.getByArchive(
@@ -1126,12 +1146,13 @@ class Publisher(object):
                 if db_file.library_file.content.sha256 != current_sha256:
                     condemned_files.add(db_file)
         if condemned_files:
-            for container, path, sha256 in archive_file_set.scheduleDeletion(
-                    condemned_files,
-                    timedelta(days=BY_HASH_STAY_OF_EXECUTION)):
+            archive_file_set.scheduleDeletion(
+                condemned_files, timedelta(days=BY_HASH_STAY_OF_EXECUTION))
+            for db_file in condemned_files:
                 self.log.debug(
                     "by-hash: Scheduled %s for %s in %s for deletion" % (
-                        sha256, path, container))
+                        db_file.library_file.content.sha256, db_file.path,
+                        db_file.container))
 
         # Ensure that all the current index files are in by-hash and have
         # corresponding database entries.
diff --git a/lib/lp/archivepublisher/tests/test_publisher.py b/lib/lp/archivepublisher/tests/test_publisher.py
index ec60e1f..8d6be2b 100644
--- a/lib/lp/archivepublisher/tests/test_publisher.py
+++ b/lib/lp/archivepublisher/tests/test_publisher.py
@@ -21,8 +21,6 @@ from fnmatch import fnmatch
 from functools import partial
 import gzip
 import hashlib
-from itertools import product
-from operator import attrgetter
 import os
 import shutil
 import stat
@@ -52,7 +50,6 @@ from testtools.matchers import (
     LessThan,
     Matcher,
     MatchesDict,
-    MatchesListwise,
     MatchesSetwise,
     MatchesStructure,
     Not,
@@ -2650,12 +2647,12 @@ class TestUpdateByHash(TestPublisherBase):
             publisher.D_writeReleaseFiles(False)
 
     @classmethod
-    def _makeScheduledDeletionDateMatcher(cls, condemned_at):
-        if condemned_at is None:
+    def _makeScheduledDeletionDateMatcher(cls, superseded_at):
+        if superseded_at is None:
             return Is(None)
         else:
             return Equals(
-                condemned_at + timedelta(days=BY_HASH_STAY_OF_EXECUTION))
+                superseded_at + timedelta(days=BY_HASH_STAY_OF_EXECUTION))
 
     def assertHasSuiteFiles(self, patterns, *properties):
         def is_interesting(path):
@@ -2669,11 +2666,13 @@ class TestUpdateByHash(TestPublisherBase):
                 self.ubuntutest.main_archive)
             if is_interesting(archive_file.path)]
         matchers = []
-        for path, condemned_at in properties:
+        for path, created_at, superseded_at in properties:
             matchers.append(MatchesStructure(
                 path=Equals('dists/breezy-autotest/%s' % path),
+                date_created=Equals(created_at),
+                date_superseded=Equals(superseded_at),
                 scheduled_deletion_date=self._makeScheduledDeletionDateMatcher(
-                    condemned_at)))
+                    superseded_at)))
         self.assertThat(files, MatchesSetwise(*matchers))
 
     def test_disabled(self):
@@ -2823,7 +2822,8 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ('Contents-*', 'Release'),
-            ('Contents-i386', None), ('Release', None))
+            ('Contents-i386', self.times[0], None),
+            ('Release', self.times[0], None))
         releases = [get_release_contents()]
         self.assertThat(
             suite_path('by-hash'),
@@ -2837,8 +2837,10 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ('Contents-*', 'Release'),
-            ('Contents-i386', None), ('Contents-hppa', None),
-            ('Release', self.times[1]), ('Release', None))
+            ('Contents-i386', self.times[0], None),
+            ('Contents-hppa', self.times[1], None),
+            ('Release', self.times[0], self.times[1]),
+            ('Release', self.times[1], None))
         releases.append(get_release_contents())
         self.assertThat(
             suite_path('by-hash'),
@@ -2851,9 +2853,11 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ('Contents-*', 'Release'),
-            ('Contents-i386', self.times[2]), ('Contents-hppa', None),
-            ('Release', self.times[1]), ('Release', self.times[2]),
-            ('Release', None))
+            ('Contents-i386', self.times[0], self.times[2]),
+            ('Contents-hppa', self.times[1], None),
+            ('Release', self.times[0], self.times[1]),
+            ('Release', self.times[1], self.times[2]),
+            ('Release', self.times[2], None))
         releases.append(get_release_contents())
         self.assertThat(
             suite_path('by-hash'),
@@ -2865,9 +2869,12 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ('Contents-*', 'Release'),
-            ('Contents-i386', self.times[2]), ('Contents-hppa', None),
-            ('Release', self.times[1]), ('Release', self.times[2]),
-            ('Release', self.times[3]), ('Release', None))
+            ('Contents-i386', self.times[0], self.times[2]),
+            ('Contents-hppa', self.times[1], None),
+            ('Release', self.times[0], self.times[1]),
+            ('Release', self.times[1], self.times[2]),
+            ('Release', self.times[2], self.times[3]),
+            ('Release', self.times[3], None))
         releases.append(get_release_contents())
         self.assertThat(
             suite_path('by-hash'),
@@ -2886,9 +2893,10 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ('Contents-*', 'Release'),
-            ('Contents-hppa', self.times[4]),
-            ('Release', self.times[3]), ('Release', self.times[4]),
-            ('Release', None))
+            ('Contents-hppa', self.times[1], self.times[4]),
+            ('Release', self.times[2], self.times[3]),
+            ('Release', self.times[3], self.times[4]),
+            ('Release', self.times[4], None))
         releases.append(get_release_contents())
         self.assertThat(
             suite_path('by-hash'),
@@ -2905,7 +2913,8 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ('Contents-*', 'Release'),
-            ('Release', self.times[5]), ('Release', None))
+            ('Release', self.times[4], self.times[5]),
+            ('Release', self.times[5], None))
         releases.append(get_release_contents())
         self.assertThat(suite_path('by-hash'), ByHashHasContents(releases[4:]))
 
@@ -2932,9 +2941,13 @@ class TestUpdateByHash(TestPublisherBase):
         for name in ('Release', 'Sources'):
             with open(suite_path('main', 'source', name), 'rb') as f:
                 main_contents.add(f.read())
+        self.assertHasSuiteFiles(
+            ('main/source/Sources',),
+            ('main/source/Sources', self.times[0], None))
 
         # Add a source package so that Sources is non-empty.
         pub_source = self.getPubSource(filecontent='Source: foo\n')
+        self.advanceTime(delta=timedelta(hours=1))
         self.runSteps(publisher, step_a=True, step_c=True, step_d=True)
         transaction.commit()
         with open(suite_path('main', 'source', 'Sources'), 'rb') as f:
@@ -2943,28 +2956,42 @@ class TestUpdateByHash(TestPublisherBase):
         self.assertThat(
             suite_path('main', 'source', 'by-hash'),
             ByHashHasContents(main_contents))
-
-        # Make the empty Sources file ready to prune.
-        self.advanceTime(
-            delta=timedelta(days=BY_HASH_STAY_OF_EXECUTION, hours=1))
+        self.assertHasSuiteFiles(
+            ('main/source/Sources',),
+            ('main/source/Sources', self.times[0], self.times[1]),
+            ('main/source/Sources', self.times[1], None))
 
         # Delete the source package so that Sources is empty again.  The
-        # empty file is reprieved and the non-empty one is condemned.
+        # empty file is reprieved (by creating a new ArchiveFile referring
+        # to it) and the non-empty one is condemned.
         pub_source.requestDeletion(self.ubuntutest.owner)
+        self.advanceTime(delta=timedelta(hours=1))
         self.runSteps(publisher, step_a=True, step_c=True, step_d=True)
         transaction.commit()
         self.assertThat(
             suite_path('main', 'source', 'by-hash'),
             ByHashHasContents(main_contents))
-        archive_files = getUtility(IArchiveFileSet).getByArchive(
-            self.ubuntutest.main_archive,
-            path='dists/breezy-autotest/main/source/Sources')
+        self.assertHasSuiteFiles(
+            ('main/source/Sources',),
+            ('main/source/Sources', self.times[0], self.times[1]),
+            ('main/source/Sources', self.times[1], self.times[2]),
+            ('main/source/Sources', self.times[2], None))
+
+        # Make the first empty Sources file ready to prune.  This doesn't
+        # change the set of files on disk, because there's still a newer
+        # reference to the empty file.
+        self.advanceTime(
+            absolute=self.times[1] + timedelta(
+                days=BY_HASH_STAY_OF_EXECUTION, minutes=30))
+        self.runSteps(publisher, step_a=True, step_c=True, step_d=True)
+        transaction.commit()
         self.assertThat(
-            sorted(archive_files, key=attrgetter('id')),
-            MatchesListwise([
-                MatchesStructure(scheduled_deletion_date=Is(None)),
-                MatchesStructure(scheduled_deletion_date=Not(Is(None))),
-                ]))
+            suite_path('main', 'source', 'by-hash'),
+            ByHashHasContents(main_contents))
+        self.assertHasSuiteFiles(
+            ('main/source/Sources',),
+            ('main/source/Sources', self.times[1], self.times[2]),
+            ('main/source/Sources', self.times[2], None))
 
     def setUpPruneableSuite(self):
         self.setUpMockTime()
@@ -2993,14 +3020,18 @@ class TestUpdateByHash(TestPublisherBase):
         # We have two condemned sets of index files and one uncondemned set.
         # main/source/Release contains a small enough amount of information
         # that it doesn't change.
-        expected_suite_files = (
-            list(product(
-                ('main/source/Sources.gz', 'main/source/Sources.bz2',
-                 'Release'),
-                (self.times[1], self.times[2], None))) +
-            [('main/source/Release', None)])
         self.assertHasSuiteFiles(
-            ('main/source/*', 'Release'), *expected_suite_files)
+            ('main/source/*', 'Release'),
+            ('main/source/Sources.gz', self.times[0], self.times[1]),
+            ('main/source/Sources.gz', self.times[1], self.times[2]),
+            ('main/source/Sources.gz', self.times[2], None),
+            ('main/source/Sources.bz2', self.times[0], self.times[1]),
+            ('main/source/Sources.bz2', self.times[1], self.times[2]),
+            ('main/source/Sources.bz2', self.times[2], None),
+            ('main/source/Release', self.times[0], None),
+            ('Release', self.times[0], self.times[1]),
+            ('Release', self.times[1], self.times[2]),
+            ('Release', self.times[2], None))
         self.assertThat(suite_path('by-hash'), ByHashHasContents(top_contents))
         self.assertThat(
             suite_path('main', 'source', 'by-hash'),
@@ -3033,14 +3064,15 @@ class TestUpdateByHash(TestPublisherBase):
         self.assertEqual(set(), publisher.dirty_pockets)
         # The condemned index files are removed, and no new Release file is
         # generated.
-        expected_suite_files = (
-            list(product(
-                ('main/source/Sources.gz', 'main/source/Sources.bz2'),
-                (self.times[2], None))) +
-            [('main/source/Release', None),
-             ('Release', self.times[2]), ('Release', None)])
         self.assertHasSuiteFiles(
-            ('main/source/*', 'Release'), *expected_suite_files)
+            ('main/source/*', 'Release'),
+            ('main/source/Sources.gz', self.times[1], self.times[2]),
+            ('main/source/Sources.gz', self.times[2], None),
+            ('main/source/Sources.bz2', self.times[1], self.times[2]),
+            ('main/source/Sources.bz2', self.times[2], None),
+            ('main/source/Release', self.times[0], None),
+            ('Release', self.times[1], self.times[2]),
+            ('Release', self.times[2], None))
         self.assertThat(suite_path('by-hash'), ByHashHasContents(top_contents))
         self.assertThat(
             suite_path('main', 'source', 'by-hash'),
@@ -3067,14 +3099,15 @@ class TestUpdateByHash(TestPublisherBase):
         self.assertEqual(release_mtime, os.stat(release_path).st_mtime)
         # The condemned index files are removed, and no new Release file is
         # generated.
-        expected_suite_files = (
-            list(product(
-                ('main/source/Sources.gz', 'main/source/Sources.bz2'),
-                (self.times[2], None))) +
-            [('main/source/Release', None),
-             ('Release', self.times[2]), ('Release', None)])
         self.assertHasSuiteFiles(
-            ('main/source/*', 'Release'), *expected_suite_files)
+            ('main/source/*', 'Release'),
+            ('main/source/Sources.gz', self.times[1], self.times[2]),
+            ('main/source/Sources.gz', self.times[2], None),
+            ('main/source/Sources.bz2', self.times[1], self.times[2]),
+            ('main/source/Sources.bz2', self.times[2], None),
+            ('main/source/Release', self.times[0], None),
+            ('Release', self.times[1], self.times[2]),
+            ('Release', self.times[2], None))
         self.assertThat(suite_path('by-hash'), ByHashHasContents(top_contents))
         self.assertThat(
             suite_path('main', 'source', 'by-hash'),
diff --git a/lib/lp/soyuz/interfaces/archivefile.py b/lib/lp/soyuz/interfaces/archivefile.py
index 005a599..e81419f 100644
--- a/lib/lp/soyuz/interfaces/archivefile.py
+++ b/lib/lp/soyuz/interfaces/archivefile.py
@@ -1,4 +1,4 @@
-# Copyright 2016 Canonical Ltd.  This software is licensed under the
+# Copyright 2016-2018 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 """Interface for a file in an archive."""
@@ -49,6 +49,16 @@ class IArchiveFile(Interface):
         title=_("The index file in the librarian."),
         schema=ILibraryFileAlias, required=True, readonly=True)
 
+    date_created = Datetime(
+        title=_("The date when this file was created."),
+        # XXX cjwatson 2018-04-17: Should be required=True, but we need to
+        # backfill existing rows first.
+        required=False, readonly=False)
+
+    date_superseded = Datetime(
+        title=_("The date when this file was scheduled for future deletion."),
+        required=False, readonly=False)
+
     scheduled_deletion_date = Datetime(
         title=_("The date when this file should stop being published."),
         required=False, readonly=False)
@@ -79,15 +89,12 @@ class IArchiveFileSet(Interface):
         :param content_type: The MIME type of the file.
         """
 
-    def getByArchive(archive, container=None, path=None, only_condemned=False,
-                     eager_load=False):
+    def getByArchive(archive, container=None, path=None, eager_load=False):
         """Get files in an archive.
 
         :param archive: Return files in this `IArchive`.
         :param container: Return only files with this container.
         :param path: Return only files with this path.
-        :param only_condemned: If True, return only files with a
-            scheduled_deletion_date set.
         :param eager_load: If True, preload related `LibraryFileAlias` and
             `LibraryFileContent` rows.
         :return: An iterable of matched files.
@@ -99,8 +106,6 @@ class IArchiveFileSet(Interface):
         :param archive_files: The `IArchiveFile`s to schedule for deletion.
         :param stay_of_execution: A `timedelta`; schedule files for deletion
             this amount of time in the future.
-        :return: An iterable of (container, path, sha256) for files that
-            were scheduled for deletion.
         """
 
     def unscheduleDeletion(archive_files):
@@ -110,8 +115,6 @@ class IArchiveFileSet(Interface):
         identical to a version that was previously condemned.
 
         :param archive_files: The `IArchiveFile`s to unschedule for deletion.
-        :return: An iterable of (container, path, sha256) for files that
-            were unscheduled for deletion.
         """
 
     def getContainersToReap(archive, container_prefix=None):
diff --git a/lib/lp/soyuz/model/archivefile.py b/lib/lp/soyuz/model/archivefile.py
index 50d66ba..530fe66 100644
--- a/lib/lp/soyuz/model/archivefile.py
+++ b/lib/lp/soyuz/model/archivefile.py
@@ -14,7 +14,6 @@ __all__ = [
 import os.path
 
 import pytz
-from storm.databases.postgres import Returning
 from storm.locals import (
     And,
     DateTime,
@@ -26,7 +25,10 @@ from storm.locals import (
 from zope.component import getUtility
 from zope.interface import implementer
 
-from lp.services.database.bulk import load_related
+from lp.services.database.bulk import (
+    create,
+    load_related,
+    )
 from lp.services.database.constants import UTC_NOW
 from lp.services.database.decoratedresultset import DecoratedResultSet
 from lp.services.database.interfaces import (
@@ -34,7 +36,6 @@ from lp.services.database.interfaces import (
     IStore,
     )
 from lp.services.database.sqlbase import convert_storm_clause_to_string
-from lp.services.database.stormexpr import BulkUpdate
 from lp.services.librarian.interfaces import ILibraryFileAliasSet
 from lp.services.librarian.model import (
     LibraryFileAlias,
@@ -46,6 +47,15 @@ from lp.soyuz.interfaces.archivefile import (
     )
 
 
+def _now():
+    """Get the current transaction timestamp.
+
+    Tests can override this with a Storm expression or a `datetime` to
+    simulate time changes.
+    """
+    return UTC_NOW
+
+
 @implementer(IArchiveFile)
 class ArchiveFile(Storm):
     """See `IArchiveFile`."""
@@ -64,6 +74,14 @@ class ArchiveFile(Storm):
     library_file_id = Int(name='library_file', allow_none=False)
     library_file = Reference(library_file_id, 'LibraryFileAlias.id')
 
+    date_created = DateTime(
+        # XXX cjwatson 2018-04-17: Should be allow_none=False, but we need
+        # to backfill existing rows first.
+        name='date_created', tzinfo=pytz.UTC, allow_none=True)
+
+    date_superseded = DateTime(
+        name='date_superseded', tzinfo=pytz.UTC, allow_none=True)
+
     scheduled_deletion_date = DateTime(
         name='scheduled_deletion_date', tzinfo=pytz.UTC, allow_none=True)
 
@@ -74,18 +92,11 @@ class ArchiveFile(Storm):
         self.container = container
         self.path = path
         self.library_file = library_file
+        self.date_created = _now()
+        self.date_superseded = None
         self.scheduled_deletion_date = None
 
 
-def _now():
-    """Get the current transaction timestamp.
-
-    Tests can override this with a Storm expression or a `datetime` to
-    simulate time changes.
-    """
-    return UTC_NOW
-
-
 @implementer(IArchiveFileSet)
 class ArchiveFileSet:
     """See `IArchiveFileSet`."""
@@ -106,8 +117,7 @@ class ArchiveFileSet:
         return cls.new(archive, container, path, library_file)
 
     @staticmethod
-    def getByArchive(archive, container=None, path=None, only_condemned=False,
-                     eager_load=False):
+    def getByArchive(archive, container=None, path=None, eager_load=False):
         """See `IArchiveFileSet`."""
         clauses = [ArchiveFile.archive == archive]
         # XXX cjwatson 2016-03-15: We'll need some more sophisticated way to
@@ -116,8 +126,6 @@ class ArchiveFileSet:
             clauses.append(ArchiveFile.container == container)
         if path is not None:
             clauses.append(ArchiveFile.path == path)
-        if only_condemned:
-            clauses.append(ArchiveFile.scheduled_deletion_date != None)
         archive_files = IStore(ArchiveFile).find(ArchiveFile, *clauses)
 
         def eager_load(rows):
@@ -132,41 +140,23 @@ class ArchiveFileSet:
     @staticmethod
     def scheduleDeletion(archive_files, stay_of_execution):
         """See `IArchiveFileSet`."""
-        clauses = [
-            ArchiveFile.id.is_in(
-                set(archive_file.id for archive_file in archive_files)),
-            ArchiveFile.library_file == LibraryFileAlias.id,
-            LibraryFileAlias.content == LibraryFileContent.id,
-            ]
-        new_date = _now() + stay_of_execution
-        return_columns = [
-            ArchiveFile.container, ArchiveFile.path, LibraryFileContent.sha256]
-        return list(IMasterStore(ArchiveFile).execute(Returning(
-            BulkUpdate(
-                {ArchiveFile.scheduled_deletion_date: new_date},
-                table=ArchiveFile,
-                values=[LibraryFileAlias, LibraryFileContent],
-                where=And(*clauses)),
-            columns=return_columns)))
+        rows = IMasterStore(ArchiveFile).find(
+            ArchiveFile, ArchiveFile.id.is_in(
+                set(archive_file.id for archive_file in archive_files)))
+        rows.set(
+            date_superseded=_now(),
+            scheduled_deletion_date=_now() + stay_of_execution)
 
     @staticmethod
     def unscheduleDeletion(archive_files):
         """See `IArchiveFileSet`."""
-        clauses = [
-            ArchiveFile.id.is_in(
-                set(archive_file.id for archive_file in archive_files)),
-            ArchiveFile.library_file == LibraryFileAlias.id,
-            LibraryFileAlias.content == LibraryFileContent.id,
-            ]
-        return_columns = [
-            ArchiveFile.container, ArchiveFile.path, LibraryFileContent.sha256]
-        return list(IMasterStore(ArchiveFile).execute(Returning(
-            BulkUpdate(
-                {ArchiveFile.scheduled_deletion_date: None},
-                table=ArchiveFile,
-                values=[LibraryFileAlias, LibraryFileContent],
-                where=And(*clauses)),
-            columns=return_columns)))
+        create(
+            (ArchiveFile.archive, ArchiveFile.container, ArchiveFile.path,
+             ArchiveFile.library_file, ArchiveFile.date_created,
+             ArchiveFile.date_superseded, ArchiveFile.scheduled_deletion_date),
+            [(archive_file.archive, archive_file.container, archive_file.path,
+              archive_file.library_file, _now(), None, None)
+             for archive_file in archive_files])
 
     @staticmethod
     def getContainersToReap(archive, container_prefix=None):
diff --git a/lib/lp/soyuz/tests/test_archivefile.py b/lib/lp/soyuz/tests/test_archivefile.py
index 039fec6..d1dea99 100644
--- a/lib/lp/soyuz/tests/test_archivefile.py
+++ b/lib/lp/soyuz/tests/test_archivefile.py
@@ -1,4 +1,4 @@
-# Copyright 2016 Canonical Ltd.  This software is licensed under the
+# Copyright 2016-2018 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 """ArchiveFile tests."""
@@ -11,6 +11,13 @@ from datetime import timedelta
 import os
 
 from storm.store import Store
+from testtools.matchers import (
+    AfterPreprocessing,
+    Equals,
+    Is,
+    MatchesSetwise,
+    MatchesStructure,
+    )
 import transaction
 from zope.component import getUtility
 from zope.security.proxy import removeSecurityProxy
@@ -25,6 +32,14 @@ from lp.testing import TestCaseWithFactory
 from lp.testing.layers import LaunchpadZopelessLayer
 
 
+def read_library_file(library_file):
+    library_file.open()
+    try:
+        return library_file.read()
+    finally:
+        library_file.close()
+
+
 class TestArchiveFile(TestCaseWithFactory):
 
     layer = LaunchpadZopelessLayer
@@ -34,11 +49,15 @@ class TestArchiveFile(TestCaseWithFactory):
         library_file = self.factory.makeLibraryFileAlias()
         archive_file = getUtility(IArchiveFileSet).new(
             archive, "foo", "dists/foo", library_file)
-        self.assertEqual(archive, archive_file.archive)
-        self.assertEqual("foo", archive_file.container)
-        self.assertEqual("dists/foo", archive_file.path)
-        self.assertEqual(library_file, archive_file.library_file)
-        self.assertIsNone(archive_file.scheduled_deletion_date)
+        self.assertThat(archive_file, MatchesStructure(
+            archive=Equals(archive),
+            container=Equals("foo"),
+            path=Equals("dists/foo"),
+            library_file=Equals(library_file),
+            date_created=Equals(
+                get_transaction_timestamp(Store.of(archive_file))),
+            date_superseded=Is(None),
+            scheduled_deletion_date=Is(None)))
 
     def test_newFromFile(self):
         root = self.makeTemporaryDirectory()
@@ -48,24 +67,24 @@ class TestArchiveFile(TestCaseWithFactory):
         with open(os.path.join(root, "dists/foo"), "rb") as f:
             archive_file = getUtility(IArchiveFileSet).newFromFile(
                 archive, "foo", "dists/foo", f, 4, "text/plain")
+        now = get_transaction_timestamp(Store.of(archive_file))
         transaction.commit()
-        self.assertEqual(archive, archive_file.archive)
-        self.assertEqual("foo", archive_file.container)
-        self.assertEqual("dists/foo", archive_file.path)
-        archive_file.library_file.open()
-        try:
-            self.assertEqual("abc\n", archive_file.library_file.read())
-        finally:
-            archive_file.library_file.close()
-        self.assertIsNone(archive_file.scheduled_deletion_date)
+        self.assertThat(archive_file, MatchesStructure(
+            archive=Equals(archive),
+            container=Equals("foo"),
+            path=Equals("dists/foo"),
+            library_file=AfterPreprocessing(
+                read_library_file, Equals("abc\n")),
+            date_created=Equals(now),
+            date_superseded=Is(None),
+            scheduled_deletion_date=Is(None)))
 
     def test_getByArchive(self):
         archives = [self.factory.makeArchive(), self.factory.makeArchive()]
         archive_files = []
-        now = get_transaction_timestamp(Store.of(archives[0]))
         for archive in archives:
             archive_files.append(self.factory.makeArchiveFile(
-                archive=archive, scheduled_deletion_date=now))
+                archive=archive))
             archive_files.append(self.factory.makeArchiveFile(
                 archive=archive, container="foo"))
         archive_file_set = getUtility(IArchiveFileSet)
@@ -83,9 +102,6 @@ class TestArchiveFile(TestCaseWithFactory):
         self.assertContentEqual(
             [], archive_file_set.getByArchive(archives[0], path="other"))
         self.assertContentEqual(
-            [archive_files[0]],
-            archive_file_set.getByArchive(archives[0], only_condemned=True))
-        self.assertContentEqual(
             archive_files[2:], archive_file_set.getByArchive(archives[1]))
         self.assertContentEqual(
             [archive_files[3]],
@@ -98,19 +114,11 @@ class TestArchiveFile(TestCaseWithFactory):
                 archives[1], path=archive_files[3].path))
         self.assertContentEqual(
             [], archive_file_set.getByArchive(archives[1], path="other"))
-        self.assertContentEqual(
-            [archive_files[2]],
-            archive_file_set.getByArchive(archives[1], only_condemned=True))
 
     def test_scheduleDeletion(self):
         archive_files = [self.factory.makeArchiveFile() for _ in range(3)]
-        expected_rows = [
-            (archive_file.container, archive_file.path,
-             archive_file.library_file.content.sha256)
-            for archive_file in archive_files[:2]]
-        rows = getUtility(IArchiveFileSet).scheduleDeletion(
+        getUtility(IArchiveFileSet).scheduleDeletion(
             archive_files[:2], timedelta(days=1))
-        self.assertContentEqual(expected_rows, rows)
         flush_database_caches()
         tomorrow = (
             get_transaction_timestamp(Store.of(archive_files[0])) +
@@ -124,17 +132,32 @@ class TestArchiveFile(TestCaseWithFactory):
         now = get_transaction_timestamp(Store.of(archive_files[0]))
         for archive_file in archive_files:
             removeSecurityProxy(archive_file).scheduled_deletion_date = now
-        expected_rows = [
-            (archive_file.container, archive_file.path,
-             archive_file.library_file.content.sha256)
-            for archive_file in archive_files[:2]]
-        rows = getUtility(IArchiveFileSet).unscheduleDeletion(
-            archive_files[:2])
-        self.assertContentEqual(expected_rows, rows)
+        archive_file_set = getUtility(IArchiveFileSet)
+        archive_file_set.unscheduleDeletion(archive_files[:2])
         flush_database_caches()
-        self.assertIsNone(archive_files[0].scheduled_deletion_date)
-        self.assertIsNone(archive_files[1].scheduled_deletion_date)
-        self.assertEqual(now, archive_files[2].scheduled_deletion_date)
+        self.assertThat(
+            archive_file_set.getByArchive(
+                archive_files[0].archive,
+                container=archive_files[0].container,
+                path=archive_files[0].path),
+            MatchesSetwise(
+                MatchesStructure(scheduled_deletion_date=Equals(now)),
+                MatchesStructure(scheduled_deletion_date=Is(None))))
+        self.assertThat(
+            archive_file_set.getByArchive(
+                archive_files[1].archive,
+                container=archive_files[1].container,
+                path=archive_files[1].path),
+            MatchesSetwise(
+                MatchesStructure(scheduled_deletion_date=Equals(now)),
+                MatchesStructure(scheduled_deletion_date=Is(None))))
+        self.assertThat(
+            archive_file_set.getByArchive(
+                archive_files[2].archive,
+                container=archive_files[2].container,
+                path=archive_files[2].path),
+            MatchesSetwise(
+                MatchesStructure(scheduled_deletion_date=Equals(now))))
 
     def test_getContainersToReap(self):
         archive = self.factory.makeArchive()

Follow ups