← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~cjwatson/launchpad:archive-file-deeper-history into launchpad:master

 

Colin Watson has proposed merging ~cjwatson/launchpad:archive-file-deeper-history into launchpad:master.

Commit message:
Retain more ArchiveFile history

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/436359

Source and binary packages are removed from the published archive on disk a day after they are no longer referenced by current index files; package files in the librarian are typically expired a week after all their associated publishing history rows have been marked as removed from disk, although files in some archives (primary archives and certain named PPAs) never expire.

Archive index files, such as `Packages`, `Sources`, and `Release` files, currently work quite differently: old versions of these files are published in `by-hash` directories on disk so that clients can continue to make use of them for a grace period of a day, after which they are both removed from disk and removed from the database.

The latter approach is problematic for the forthcoming snapshot service, because it means we can only serve snapshots of index files for as long as they are still published on disk: not much more useful than what we already have!

This commit implements a much more useful policy.  We now treat archive index files similarly to source and binary packages: they're removed from disk after a day as before, to prevent `by-hash` directories getting unreasonably large and causing bloat on mirrors, but now we merely set the `ArchiveFile.date_removed` column rather than deleting the row entirely.  The `expire-archive-files` script now handles `ArchiveFile` as well as `SourcePackageReleaseFile` and `BinaryPackageFile` rows, which means that old snapshots of the primary archive and some other important archives will be retained in the database and the librarian.

This will of course cause the librarian to grow more quickly, but some back-of-the-envelope calculations suggest that even if we retain old archive index files in perpetuity then this will be on the order of an extra terabyte per year, which seems quite manageable.

DB MP: https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/436358
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:archive-file-deeper-history into launchpad:master.
diff --git a/database/schema/security.cfg b/database/schema/security.cfg
index 9021834..861a41b 100644
--- a/database/schema/security.cfg
+++ b/database/schema/security.cfg
@@ -2032,6 +2032,7 @@ type=user
 [binaryfile-expire]
 groups=script
 public.archive                          = SELECT
+public.archivefile                      = SELECT
 public.binarypackagefile                = SELECT
 public.binarypackagepublishinghistory   = SELECT
 public.binarypackagerelease             = SELECT
diff --git a/lib/lp/archivepublisher/publishing.py b/lib/lp/archivepublisher/publishing.py
index 5778aa9..877ca78 100644
--- a/lib/lp/archivepublisher/publishing.py
+++ b/lib/lp/archivepublisher/publishing.py
@@ -1274,17 +1274,19 @@ class Publisher:
         by_hashes = ByHashes(self._config.distsroot, self.log)
         existing_live_files = {}
         existing_nonlive_files = {}
-        keep_files = set()
         reapable_files = set()
 
         def strip_dists(path):
             assert path.startswith("dists/")
             return path[len("dists/") :]
 
-        # Record all files from the database.
+        # Record all published files from the database.
         db_now = get_transaction_timestamp(IStore(ArchiveFile))
         for db_file in archive_file_set.getByArchive(
-            self.archive, container=container, eager_load=True
+            self.archive,
+            container=container,
+            only_published=True,
+            eager_load=True,
         ):
             file_key = (
                 strip_dists(db_file.path),
@@ -1375,16 +1377,18 @@ class Publisher:
         # Remove any files from disk that aren't recorded in the database.
         by_hashes.prune()
 
-        # And remove expired ArchiveFiles from the DB now that we've pruned
-        # them and their directories from disk.
-        delete_files = reapable_files - keep_files
-        if delete_files:
-            for container, path, sha256 in archive_file_set.delete(
-                delete_files
-            ):
+        # And mark expired ArchiveFiles as deleted in the DB now that we've
+        # pruned them and their directories from disk.
+        if reapable_files:
+            archive_file_set.markDeleted(reapable_files)
+            for db_file in reapable_files:
                 self.log.debug(
-                    "by-hash: Deleted %s for %s in %s"
-                    % (sha256, path, container)
+                    "by-hash: Marked %s for %s in %s as deleted"
+                    % (
+                        db_file.library_file.content.sha256,
+                        db_file.path,
+                        db_file.container,
+                    )
                 )
 
     def _writeReleaseFile(self, suite, release_data):
diff --git a/lib/lp/archivepublisher/tests/test_publisher.py b/lib/lp/archivepublisher/tests/test_publisher.py
index f3ac846..c9fe64a 100644
--- a/lib/lp/archivepublisher/tests/test_publisher.py
+++ b/lib/lp/archivepublisher/tests/test_publisher.py
@@ -17,6 +17,7 @@ from datetime import datetime, timedelta
 from fnmatch import fnmatch
 from functools import partial
 from textwrap import dedent
+from typing import Optional, Sequence, Tuple
 from unittest import mock
 
 import pytz
@@ -3066,7 +3067,26 @@ class TestUpdateByHash(TestPublisherBase):
                 superseded_at + timedelta(days=BY_HASH_STAY_OF_EXECUTION)
             )
 
-    def assertHasSuiteFiles(self, patterns, *properties):
+    def assertHasSuiteFiles(
+        self,
+        patterns: Sequence[str],
+        *properties: Sequence[
+            Tuple[str, Optional[int], Optional[int], Optional[int]]
+        ]
+    ):
+        """Assert that the database records certain archive files.
+
+        :param patterns: A sequence of `fnmatch` patterns for files under
+            `dists/breezy-autotest/` that we're interested in.
+        :param properties: A sequence of (`path`, `created_at`,
+            `superseded_at`, `removed_at`) tuples.  Each must match one of
+            the `ArchiveFile` rows matching `patterns`; `path` is relative
+            to `dists/breezy-autotest/`, while the `*_at` properties are
+            either None (indicating that the corresponding `date_*` column
+            should be None) or an index into `self.times` (indicating that
+            the corresponding `date_*` column should equal that time).
+        """
+
         def is_interesting(path):
             return any(
                 fnmatch(path, "dists/breezy-autotest/%s" % pattern)
@@ -3081,7 +3101,12 @@ class TestUpdateByHash(TestPublisherBase):
             if is_interesting(archive_file.path)
         ]
         matchers = []
-        for path, created_at, superseded_at in properties:
+        for path, created_at, superseded_at, removed_at in properties:
+            created_at = None if created_at is None else self.times[created_at]
+            superseded_at = (
+                None if superseded_at is None else self.times[superseded_at]
+            )
+            removed_at = None if removed_at is None else self.times[removed_at]
             scheduled_deletion_date_matcher = (
                 self._makeScheduledDeletionDateMatcher(superseded_at)
             )
@@ -3091,6 +3116,7 @@ class TestUpdateByHash(TestPublisherBase):
                     date_created=Equals(created_at),
                     date_superseded=Equals(superseded_at),
                     scheduled_deletion_date=scheduled_deletion_date_matcher,
+                    date_removed=Equals(removed_at),
                 )
             )
         self.assertThat(files, MatchesSetwise(*matchers))
@@ -3280,8 +3306,8 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ("Contents-*", "Release"),
-            ("Contents-i386", self.times[0], None),
-            ("Release", self.times[0], None),
+            ("Contents-i386", 0, None, None),
+            ("Release", 0, None, None),
         )
         releases = [get_release_contents()]
         self.assertThat(
@@ -3297,10 +3323,10 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ("Contents-*", "Release"),
-            ("Contents-i386", self.times[0], None),
-            ("Contents-hppa", self.times[1], None),
-            ("Release", self.times[0], self.times[1]),
-            ("Release", self.times[1], None),
+            ("Contents-i386", 0, None, None),
+            ("Contents-hppa", 1, None, None),
+            ("Release", 0, 1, None),
+            ("Release", 1, None, None),
         )
         releases.append(get_release_contents())
         self.assertThat(
@@ -3315,11 +3341,11 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ("Contents-*", "Release"),
-            ("Contents-i386", self.times[0], self.times[2]),
-            ("Contents-hppa", self.times[1], None),
-            ("Release", self.times[0], self.times[1]),
-            ("Release", self.times[1], self.times[2]),
-            ("Release", self.times[2], None),
+            ("Contents-i386", 0, 2, None),
+            ("Contents-hppa", 1, None, None),
+            ("Release", 0, 1, None),
+            ("Release", 1, 2, None),
+            ("Release", 2, None, None),
         )
         releases.append(get_release_contents())
         self.assertThat(
@@ -3333,12 +3359,12 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ("Contents-*", "Release"),
-            ("Contents-i386", self.times[0], self.times[2]),
-            ("Contents-hppa", self.times[1], None),
-            ("Release", self.times[0], self.times[1]),
-            ("Release", self.times[1], self.times[2]),
-            ("Release", self.times[2], self.times[3]),
-            ("Release", self.times[3], None),
+            ("Contents-i386", 0, 2, None),
+            ("Contents-hppa", 1, None, None),
+            ("Release", 0, 1, None),
+            ("Release", 1, 2, None),
+            ("Release", 2, 3, None),
+            ("Release", 3, None, None),
         )
         releases.append(get_release_contents())
         self.assertThat(
@@ -3365,10 +3391,13 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ("Contents-*", "Release"),
-            ("Contents-hppa", self.times[1], self.times[4]),
-            ("Release", self.times[2], self.times[3]),
-            ("Release", self.times[3], self.times[4]),
-            ("Release", self.times[4], None),
+            ("Contents-i386", 0, 2, 4),
+            ("Contents-hppa", 1, 4, None),
+            ("Release", 0, 1, 4),
+            ("Release", 1, 2, 4),
+            ("Release", 2, 3, None),
+            ("Release", 3, 4, None),
+            ("Release", 4, None, None),
         )
         releases.append(get_release_contents())
         self.assertThat(
@@ -3393,8 +3422,14 @@ class TestUpdateByHash(TestPublisherBase):
         flush_database_caches()
         self.assertHasSuiteFiles(
             ("Contents-*", "Release"),
-            ("Release", self.times[4], self.times[5]),
-            ("Release", self.times[5], None),
+            ("Contents-i386", 0, 2, 4),
+            ("Contents-hppa", 1, 4, 5),
+            ("Release", 0, 1, 4),
+            ("Release", 1, 2, 4),
+            ("Release", 2, 3, 5),
+            ("Release", 3, 4, 5),
+            ("Release", 4, 5, None),
+            ("Release", 5, None, None),
         )
         releases.append(get_release_contents())
         self.assertThat(suite_path("by-hash"), ByHashHasContents(releases[4:]))
@@ -3430,7 +3465,7 @@ class TestUpdateByHash(TestPublisherBase):
                 main_contents.add(f.read())
         self.assertHasSuiteFiles(
             ("main/source/Sources",),
-            ("main/source/Sources", self.times[0], None),
+            ("main/source/Sources", 0, None, None),
         )
 
         # Add a source package so that Sources is non-empty.
@@ -3447,8 +3482,8 @@ class TestUpdateByHash(TestPublisherBase):
         )
         self.assertHasSuiteFiles(
             ("main/source/Sources",),
-            ("main/source/Sources", self.times[0], self.times[1]),
-            ("main/source/Sources", self.times[1], None),
+            ("main/source/Sources", 0, 1, None),
+            ("main/source/Sources", 1, None, None),
         )
 
         # Delete the source package so that Sources is empty again.  The
@@ -3464,9 +3499,9 @@ class TestUpdateByHash(TestPublisherBase):
         )
         self.assertHasSuiteFiles(
             ("main/source/Sources",),
-            ("main/source/Sources", self.times[0], self.times[1]),
-            ("main/source/Sources", self.times[1], self.times[2]),
-            ("main/source/Sources", self.times[2], None),
+            ("main/source/Sources", 0, 1, None),
+            ("main/source/Sources", 1, 2, None),
+            ("main/source/Sources", 2, None, None),
         )
 
         # Make the first empty Sources file ready to prune.  This doesn't
@@ -3484,8 +3519,9 @@ class TestUpdateByHash(TestPublisherBase):
         )
         self.assertHasSuiteFiles(
             ("main/source/Sources",),
-            ("main/source/Sources", self.times[1], self.times[2]),
-            ("main/source/Sources", self.times[2], None),
+            ("main/source/Sources", 0, 1, 3),
+            ("main/source/Sources", 1, 2, None),
+            ("main/source/Sources", 2, None, None),
         )
 
     def setUpPruneableSuite(self):
@@ -3515,7 +3551,10 @@ class TestUpdateByHash(TestPublisherBase):
             for name in ("Release", "Sources.gz", "Sources.bz2"):
                 with open(suite_path("main", "source", name), "rb") as f:
                     main_contents.append(f.read())
-            self.advanceTime(delta=timedelta(hours=6))
+            # Advance time between each publisher run.  We don't advance
+            # time after the last one, since we'll do that below.
+            if sourcename != "baz":
+                self.advanceTime(delta=timedelta(hours=6))
         transaction.commit()
 
         # We have two condemned sets of index files and one uncondemned set.
@@ -3523,16 +3562,16 @@ class TestUpdateByHash(TestPublisherBase):
         # that it doesn't change.
         self.assertHasSuiteFiles(
             ("main/source/*", "Release"),
-            ("main/source/Sources.gz", self.times[0], self.times[1]),
-            ("main/source/Sources.gz", self.times[1], self.times[2]),
-            ("main/source/Sources.gz", self.times[2], None),
-            ("main/source/Sources.bz2", self.times[0], self.times[1]),
-            ("main/source/Sources.bz2", self.times[1], self.times[2]),
-            ("main/source/Sources.bz2", self.times[2], None),
-            ("main/source/Release", self.times[0], None),
-            ("Release", self.times[0], self.times[1]),
-            ("Release", self.times[1], self.times[2]),
-            ("Release", self.times[2], None),
+            ("main/source/Sources.gz", 0, 1, None),
+            ("main/source/Sources.gz", 1, 2, None),
+            ("main/source/Sources.gz", 2, None, None),
+            ("main/source/Sources.bz2", 0, 1, None),
+            ("main/source/Sources.bz2", 1, 2, None),
+            ("main/source/Sources.bz2", 2, None, None),
+            ("main/source/Release", 0, None, None),
+            ("Release", 0, 1, None),
+            ("Release", 1, 2, None),
+            ("Release", 2, None, None),
         )
         self.assertThat(suite_path("by-hash"), ByHashHasContents(top_contents))
         self.assertThat(
@@ -3574,13 +3613,16 @@ class TestUpdateByHash(TestPublisherBase):
         # generated.
         self.assertHasSuiteFiles(
             ("main/source/*", "Release"),
-            ("main/source/Sources.gz", self.times[1], self.times[2]),
-            ("main/source/Sources.gz", self.times[2], None),
-            ("main/source/Sources.bz2", self.times[1], self.times[2]),
-            ("main/source/Sources.bz2", self.times[2], None),
-            ("main/source/Release", self.times[0], None),
-            ("Release", self.times[1], self.times[2]),
-            ("Release", self.times[2], None),
+            ("main/source/Sources.gz", 0, 1, 3),
+            ("main/source/Sources.gz", 1, 2, None),
+            ("main/source/Sources.gz", 2, None, None),
+            ("main/source/Sources.bz2", 0, 1, 3),
+            ("main/source/Sources.bz2", 1, 2, None),
+            ("main/source/Sources.bz2", 2, None, None),
+            ("main/source/Release", 0, None, None),
+            ("Release", 0, 1, 3),
+            ("Release", 1, 2, None),
+            ("Release", 2, None, None),
         )
         self.assertThat(suite_path("by-hash"), ByHashHasContents(top_contents))
         self.assertThat(
@@ -3615,13 +3657,16 @@ class TestUpdateByHash(TestPublisherBase):
         # generated.
         self.assertHasSuiteFiles(
             ("main/source/*", "Release"),
-            ("main/source/Sources.gz", self.times[1], self.times[2]),
-            ("main/source/Sources.gz", self.times[2], None),
-            ("main/source/Sources.bz2", self.times[1], self.times[2]),
-            ("main/source/Sources.bz2", self.times[2], None),
-            ("main/source/Release", self.times[0], None),
-            ("Release", self.times[1], self.times[2]),
-            ("Release", self.times[2], None),
+            ("main/source/Sources.gz", 0, 1, 3),
+            ("main/source/Sources.gz", 1, 2, None),
+            ("main/source/Sources.gz", 2, None, None),
+            ("main/source/Sources.bz2", 0, 1, 3),
+            ("main/source/Sources.bz2", 1, 2, None),
+            ("main/source/Sources.bz2", 2, None, None),
+            ("main/source/Release", 0, None, None),
+            ("Release", 0, 1, 3),
+            ("Release", 1, 2, None),
+            ("Release", 2, None, None),
         )
         self.assertThat(suite_path("by-hash"), ByHashHasContents(top_contents))
         self.assertThat(
diff --git a/lib/lp/soyuz/interfaces/archivefile.py b/lib/lp/soyuz/interfaces/archivefile.py
index 512b07b..6c530b3 100644
--- a/lib/lp/soyuz/interfaces/archivefile.py
+++ b/lib/lp/soyuz/interfaces/archivefile.py
@@ -72,6 +72,15 @@ class IArchiveFile(Interface):
         readonly=False,
     )
 
+    date_removed = Datetime(
+        title=_(
+            "The date when this file was entirely removed from the published "
+            "archive."
+        ),
+        required=False,
+        readonly=False,
+    )
+
 
 class IArchiveFileSet(Interface):
     """Bulk operations on files in an archive."""
@@ -104,6 +113,7 @@ class IArchiveFileSet(Interface):
         path=None,
         sha256=None,
         condemned=None,
+        only_published=False,
         eager_load=False,
     ):
         """Get files in an archive.
@@ -119,6 +129,8 @@ class IArchiveFileSet(Interface):
             scheduled_deletion_date set; if False, return only files without
             a scheduled_deletion_date set; if None (the default), return
             both.
+        :param only_published: If True, return only files without a
+            `date_removed` set.
         :param eager_load: If True, preload related `LibraryFileAlias` and
             `LibraryFileContent` rows.
         :return: An iterable of matched files.
@@ -141,10 +153,12 @@ class IArchiveFileSet(Interface):
         :return: An iterable of matched container names.
         """
 
-    def delete(archive_files):
-        """Delete these archive files.
+    def markDeleted(archive_files):
+        """Mark these archive files as deleted.
+
+        This does not actually delete the rows from the database;
+        `lp.soyuz.scripts.expire_archive_files` will expire their
+        corresponding `LibraryFileAlias` rows as needed.
 
         :param archive_files: The `IArchiveFile`s to delete.
-        :return: An iterable of (container, path, sha256) for files that
-            were deleted.
         """
diff --git a/lib/lp/soyuz/model/archivefile.py b/lib/lp/soyuz/model/archivefile.py
index 43e27b9..08a6bcf 100644
--- a/lib/lp/soyuz/model/archivefile.py
+++ b/lib/lp/soyuz/model/archivefile.py
@@ -12,7 +12,7 @@ import os.path
 import re
 
 import pytz
-from storm.locals import And, DateTime, Int, Reference, Storm, Unicode
+from storm.locals import DateTime, Int, Reference, Storm, Unicode
 from zope.component import getUtility
 from zope.interface import implementer
 
@@ -20,7 +20,6 @@ from lp.services.database.bulk import load_related
 from lp.services.database.constants import UTC_NOW
 from lp.services.database.decoratedresultset import DecoratedResultSet
 from lp.services.database.interfaces import IPrimaryStore, IStore
-from lp.services.database.sqlbase import convert_storm_clause_to_string
 from lp.services.database.stormexpr import RegexpMatch
 from lp.services.librarian.interfaces import ILibraryFileAliasSet
 from lp.services.librarian.model import LibraryFileAlias, LibraryFileContent
@@ -70,6 +69,10 @@ class ArchiveFile(Storm):
         name="scheduled_deletion_date", tzinfo=pytz.UTC, allow_none=True
     )
 
+    date_removed = DateTime(
+        name="date_removed", tzinfo=pytz.UTC, allow_none=True
+    )
+
     def __init__(self, archive, container, path, library_file):
         """Construct an `ArchiveFile`."""
         super().__init__()
@@ -80,6 +83,7 @@ class ArchiveFile(Storm):
         self.date_created = _now()
         self.date_superseded = None
         self.scheduled_deletion_date = None
+        self.date_removed = None
 
 
 @implementer(IArchiveFileSet)
@@ -115,6 +119,7 @@ class ArchiveFileSet:
         path_parent=None,
         sha256=None,
         condemned=None,
+        only_published=False,
         eager_load=False,
     ):
         """See `IArchiveFileSet`."""
@@ -144,6 +149,8 @@ class ArchiveFileSet:
                 clauses.append(ArchiveFile.scheduled_deletion_date != None)
             else:
                 clauses.append(ArchiveFile.scheduled_deletion_date == None)
+        if only_published:
+            clauses.append(ArchiveFile.date_removed == None)
         archive_files = IStore(ArchiveFile).find(ArchiveFile, *clauses)
 
         def eager_load(rows):
@@ -184,30 +191,13 @@ class ArchiveFileSet:
         )
 
     @staticmethod
-    def delete(archive_files):
+    def markDeleted(archive_files):
         """See `IArchiveFileSet`."""
-        # XXX cjwatson 2016-03-30 bug=322972: Requires manual SQL due to
-        # lack of support for DELETE FROM ... USING ... in Storm.
-        clauses = [
+        rows = IPrimaryStore(ArchiveFile).find(
+            ArchiveFile,
             ArchiveFile.id.is_in(
                 {archive_file.id for archive_file in archive_files}
             ),
-            ArchiveFile.library_file_id == LibraryFileAlias.id,
-            LibraryFileAlias.contentID == LibraryFileContent.id,
-        ]
-        where = convert_storm_clause_to_string(And(*clauses))
-        return list(
-            IPrimaryStore(ArchiveFile).execute(
-                """
-            DELETE FROM ArchiveFile
-            USING LibraryFileAlias, LibraryFileContent
-            WHERE """
-                + where
-                + """
-            RETURNING
-                ArchiveFile.container,
-                ArchiveFile.path,
-                LibraryFileContent.sha256
-            """
-            )
+            ArchiveFile.date_removed == None,
         )
+        rows.set(date_removed=_now())
diff --git a/lib/lp/soyuz/scripts/expire_archive_files.py b/lib/lp/soyuz/scripts/expire_archive_files.py
index ef82fef..1168dae 100755
--- a/lib/lp/soyuz/scripts/expire_archive_files.py
+++ b/lib/lp/soyuz/scripts/expire_archive_files.py
@@ -13,6 +13,7 @@ from lp.services.librarian.model import LibraryFileAlias
 from lp.services.scripts.base import LaunchpadCronScript
 from lp.soyuz.enums import ArchivePurpose
 from lp.soyuz.model.archive import Archive
+from lp.soyuz.model.archivefile import ArchiveFile
 from lp.soyuz.model.files import BinaryPackageFile, SourcePackageReleaseFile
 from lp.soyuz.model.publishing import (
     BinaryPackagePublishingHistory,
@@ -94,19 +95,63 @@ class ArchiveExpirer(LaunchpadCronScript):
             ),
         )
 
-    def _determineExpirables(self, num_days, binary):
+    def _determineExpirables(self, num_days, table):
+        """Return expirable LibraryFileAlias IDs based on a given table."""
         stay_of_execution = Cast("%d days" % num_days, "interval")
         archive_types = (ArchivePurpose.PPA, ArchivePurpose.PARTNER)
+        eligible_clauses = []
+        denied_clauses = []
 
         LFA = LibraryFileAlias
-        if binary:
-            xPF = BinaryPackageFile
-            xPPH = BinaryPackagePublishingHistory
-            xPR_join = xPF.binarypackagerelease == xPPH.binarypackagereleaseID
+        if table == BinaryPackageFile:
+            BPF = BinaryPackageFile
+            BPPH = BinaryPackagePublishingHistory
+            lfa_column = BPF.libraryfileID
+            date_removed_column = BPPH.dateremoved
+            eligible_clauses.extend(
+                [
+                    BPF.libraryfile == LFA.id,
+                    BPF.binarypackagerelease == BPPH.binarypackagereleaseID,
+                    BPPH.archive == Archive.id,
+                ]
+            )
+            denied_clauses.extend(
+                [
+                    BPF.binarypackagerelease == BPPH.binarypackagereleaseID,
+                    BPPH.archive == Archive.id,
+                ]
+            )
+        elif table == SourcePackageReleaseFile:
+            SPRF = SourcePackageReleaseFile
+            SPPH = SourcePackagePublishingHistory
+            lfa_column = SPRF.libraryfileID
+            date_removed_column = SPPH.dateremoved
+            eligible_clauses.extend(
+                [
+                    SPRF.libraryfile == LFA.id,
+                    SPRF.sourcepackagerelease == SPPH.sourcepackagereleaseID,
+                    SPPH.archive == Archive.id,
+                ]
+            )
+            denied_clauses.extend(
+                [
+                    SPRF.sourcepackagerelease == SPPH.sourcepackagereleaseID,
+                    SPPH.archive == Archive.id,
+                ]
+            )
+        elif table == ArchiveFile:
+            lfa_column = ArchiveFile.library_file_id
+            date_removed_column = ArchiveFile.date_removed
+            eligible_clauses.extend(
+                [
+                    ArchiveFile.library_file == LFA.id,
+                    ArchiveFile.date_removed < UTC_NOW - stay_of_execution,
+                    ArchiveFile.archive == Archive.id,
+                ]
+            )
+            denied_clauses.append(ArchiveFile.archive == Archive.id)
         else:
-            xPF = SourcePackageReleaseFile
-            xPPH = SourcePackagePublishingHistory
-            xPR_join = xPF.sourcepackagerelease == xPPH.sourcepackagereleaseID
+            raise AssertionError("Unknown table: %r" % table)
         full_archive_name = Concatenate(
             Person.name, Concatenate("/", Archive.name)
         )
@@ -117,19 +162,20 @@ class ArchiveExpirer(LaunchpadCronScript):
         eligible = Select(
             LFA.id,
             where=And(
-                xPF.libraryfile == LFA.id,
-                xPR_join,
-                xPPH.dateremoved < UTC_NOW - stay_of_execution,
-                xPPH.archive == Archive.id,
+                *eligible_clauses,
+                date_removed_column < UTC_NOW - stay_of_execution,
                 Archive.purpose.is_in(archive_types),
                 LFA.expires == None,
             ),
         )
         denied = Select(
+<<<<<<< lib/lp/soyuz/scripts/expire_archive_files.py
             xPF.libraryfile_id,
+=======
+            lfa_column,
+>>>>>>> lib/lp/soyuz/scripts/expire_archive_files.py
             where=And(
-                xPR_join,
-                xPPH.archive == Archive.id,
+                *denied_clauses,
                 Archive.owner == Person.id,
                 Or(
                     And(
@@ -144,21 +190,13 @@ class ArchiveExpirer(LaunchpadCronScript):
                         Not(full_archive_name.is_in(self.always_expire)),
                     ),
                     Not(Archive.purpose.is_in(archive_types)),
-                    xPPH.dateremoved > UTC_NOW - stay_of_execution,
-                    xPPH.dateremoved == None,
+                    date_removed_column > UTC_NOW - stay_of_execution,
+                    date_removed_column == None,
                 ),
             ),
         )
         return list(self.store.execute(Except(eligible, denied)))
 
-    def determineSourceExpirables(self, num_days):
-        """Return expirable libraryfilealias IDs."""
-        return self._determineExpirables(num_days=num_days, binary=False)
-
-    def determineBinaryExpirables(self, num_days):
-        """Return expirable libraryfilealias IDs."""
-        return self._determineExpirables(num_days=num_days, binary=True)
-
     def main(self):
         self.logger.info("Starting the PPA binary expiration")
         num_days = self.options.num_days
@@ -166,8 +204,13 @@ class ArchiveExpirer(LaunchpadCronScript):
 
         self.store = IStore(Archive)
 
-        lfa_ids = self.determineSourceExpirables(num_days)
-        lfa_ids.extend(self.determineBinaryExpirables(num_days))
+        lfa_ids = []
+        for table in (
+            BinaryPackageFile,
+            SourcePackageReleaseFile,
+            ArchiveFile,
+        ):
+            lfa_ids.extend(self._determineExpirables(num_days, table))
         batch_count = 0
         batch_limit = 500
         for id in lfa_ids:
diff --git a/lib/lp/soyuz/scripts/tests/test_expire_archive_files.py b/lib/lp/soyuz/scripts/tests/test_expire_archive_files.py
index 12403fd..63c85b4 100644
--- a/lib/lp/soyuz/scripts/tests/test_expire_archive_files.py
+++ b/lib/lp/soyuz/scripts/tests/test_expire_archive_files.py
@@ -54,7 +54,7 @@ class ArchiveExpiryTestBase(TestCaseWithFactory):
         script.main()
 
     def _setUpExpirablePublications(self, archive=None):
-        """Helper to set up two publications that are both expirable."""
+        """Helper to set up publications and indexes that are all expirable."""
         if archive is None:
             archive = self.archive
         pkg5 = self.stp.getPubSource(
@@ -78,7 +78,21 @@ class ArchiveExpiryTestBase(TestCaseWithFactory):
             self.archive2.distribution.currentseries, pub.pocket, self.archive2
         )
         other_binary.dateremoved = self.over_threshold_date
-        return pkg5, pub
+        af = self.factory.makeArchiveFile(
+            archive=archive,
+            container="release:",
+            path="dists/%s/Release" % pkg5.distroseries.getSuite(pkg5.pocket),
+            date_removed=self.over_threshold_date,
+        )
+        self.factory.makeArchiveFile(
+            archive=self.archive2,
+            container="release:",
+            path="dists/%s/Release"
+            % self.archive2.distribution.currentseries.getSuite(pub.pocket),
+            library_file=af.library_file,
+            date_removed=self.over_threshold_date,
+        )
+        return pkg5, pub, af
 
     def assertBinaryExpired(self, publication):
         self.assertNotEqual(
@@ -108,6 +122,18 @@ class ArchiveExpiryTestBase(TestCaseWithFactory):
             "lfa.expires should be None, but it's not.",
         )
 
+    def assertIndexExpired(self, archive_file):
+        self.assertIsNotNone(
+            archive_file.library_file.expires,
+            "lfa.expires should be set, but it's not.",
+        )
+
+    def assertIndexNotExpired(self, archive_file):
+        self.assertIsNone(
+            archive_file.library_file.expires,
+            "lfa.expires should be None, but it's not.",
+        )
+
 
 class ArchiveExpiryCommonTests:
     """Common source/binary expiration test cases.
@@ -126,10 +152,16 @@ class ArchiveExpiryCommonTests:
         [pub] = self.stp.getPubBinaries(
             pub_source=pkg1, dateremoved=None, archive=self.archive
         )
+        af = self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg1.distroseries.getSuite(pkg1.pocket),
+        )
 
         self.runScript()
         self.assertSourceNotExpired(pkg1)
         self.assertBinaryNotExpired(pub)
+        self.assertIndexNotExpired(af)
 
     def testNoExpirationWithDateUnderThreshold(self):
         """Test no expiring if dateremoved too recent."""
@@ -144,10 +176,17 @@ class ArchiveExpiryCommonTests:
             dateremoved=self.under_threshold_date,
             archive=self.archive,
         )
+        af = self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg2.distroseries.getSuite(pkg2.pocket),
+            date_removed=self.under_threshold_date,
+        )
 
         self.runScript()
         self.assertSourceNotExpired(pkg2)
         self.assertBinaryNotExpired(pub)
+        self.assertIndexNotExpired(af)
 
     def testExpirationWithDateOverThreshold(self):
         """Test expiring works if dateremoved old enough."""
@@ -162,10 +201,17 @@ class ArchiveExpiryCommonTests:
             dateremoved=self.over_threshold_date,
             archive=self.archive,
         )
+        af = self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg3.distroseries.getSuite(pkg3.pocket),
+            date_removed=self.over_threshold_date,
+        )
 
         self.runScript()
         self.assertSourceExpired(pkg3)
         self.assertBinaryExpired(pub)
+        self.assertIndexExpired(af)
 
     def testNoExpirationWithDateOverThresholdAndOtherValidPublication(self):
         """Test no expiry if dateremoved old enough but other publication."""
@@ -190,10 +236,23 @@ class ArchiveExpiryCommonTests:
             self.archive2.distribution.currentseries, pub.pocket, self.archive2
         )
         other_binary.dateremoved = None
+        af = self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg4.distroseries.getSuite(pkg4.pocket),
+            date_removed=self.over_threshold_date,
+        )
+        self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg4.distroseries.getSuite(pkg4.pocket),
+            library_file=af.library_file,
+        )
 
         self.runScript()
         self.assertSourceNotExpired(pkg4)
         self.assertBinaryNotExpired(pub)
+        self.assertIndexNotExpired(af)
 
     def testNoExpirationWithDateOverThresholdAndOtherPubUnderThreshold(self):
         """Test no expiring.
@@ -222,10 +281,24 @@ class ArchiveExpiryCommonTests:
             self.archive2.distribution.currentseries, pub.pocket, self.archive2
         )
         other_binary.dateremoved = self.under_threshold_date
+        af = self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg5.distroseries.getSuite(pkg5.pocket),
+            date_removed=self.over_threshold_date,
+        )
+        self.factory.makeArchiveFile(
+            archive=self.archive,
+            container="release:",
+            path="dists/%s/Release" % pkg5.distroseries.getSuite(pkg5.pocket),
+            library_file=af.library_file,
+            date_removed=self.under_threshold_date,
+        )
 
         self.runScript()
         self.assertSourceNotExpired(pkg5)
         self.assertBinaryNotExpired(pub)
+        self.assertIndexNotExpired(af)
 
     def testNoExpirationWithDateOverThresholdAndOtherPubOverThreshold(self):
         """Test expiring works.
@@ -233,14 +306,15 @@ class ArchiveExpiryCommonTests:
         Test expiring works if dateremoved old enough and other publication
         is over date threshold.
         """
-        source, binary = self._setUpExpirablePublications()
+        source, binary, index = self._setUpExpirablePublications()
         self.runScript()
         self.assertSourceExpired(source)
         self.assertBinaryExpired(binary)
+        self.assertIndexExpired(index)
 
     def testDryRun(self):
         """Test that when dryrun is specified, nothing is expired."""
-        source, binary = self._setUpExpirablePublications()
+        source, binary, index = self._setUpExpirablePublications()
         # We have to commit here otherwise when the script aborts it
         # will remove the test publications we just created.
         self.layer.txn.commit()
@@ -249,16 +323,20 @@ class ArchiveExpiryCommonTests:
         script.main()
         self.assertSourceNotExpired(source)
         self.assertBinaryNotExpired(binary)
+        self.assertIndexNotExpired(index)
 
     def testDoesNotAffectPrimary(self):
         """Test that expiry does not happen for non-PPA publications."""
         primary_archive = getUtility(IDistributionSet)[
             "ubuntutest"
         ].main_archive
-        source, binary = self._setUpExpirablePublications(primary_archive)
+        source, binary, index = self._setUpExpirablePublications(
+            primary_archive
+        )
         self.runScript()
         self.assertSourceNotExpired(source)
         self.assertBinaryNotExpired(binary)
+        self.assertIndexNotExpired(index)
 
 
 class TestPPAExpiry(ArchiveExpiryTestBase, ArchiveExpiryCommonTests):
@@ -280,7 +358,9 @@ class TestPPAExpiry(ArchiveExpiryTestBase, ArchiveExpiryCommonTests):
 
     def testNeverExpireWorks(self):
         """Test that never-expiring PPA owners are not expired."""
-        source, binary = self._setUpExpirablePublications(archive=self.archive)
+        source, binary, index = self._setUpExpirablePublications(
+            archive=self.archive
+        )
         script = self.getScript()
         script.never_expire = [
             self.archive.owner.name,
@@ -289,10 +369,13 @@ class TestPPAExpiry(ArchiveExpiryTestBase, ArchiveExpiryCommonTests):
         script.main()
         self.assertSourceNotExpired(source)
         self.assertBinaryNotExpired(binary)
+        self.assertIndexNotExpired(index)
 
     def testNeverExpireArchivesWorks(self):
         """Test that never-expiring individual PPAs are not expired."""
-        source, binary = self._setUpExpirablePublications(archive=self.archive)
+        source, binary, index = self._setUpExpirablePublications(
+            archive=self.archive
+        )
         script = self.getScript()
         script.never_expire = [
             "%s/%s" % (self.archive.owner.name, self.archive.name)
@@ -301,25 +384,28 @@ class TestPPAExpiry(ArchiveExpiryTestBase, ArchiveExpiryCommonTests):
         script.main()
         self.assertSourceNotExpired(source)
         self.assertBinaryNotExpired(binary)
+        self.assertIndexNotExpired(index)
 
     def testAlwaysExpireWorks(self):
         """Test that always-expiring private PPAs are expired anyway."""
         p3a = self.factory.makeArchive(private=True)
-        source, binary = self._setUpExpirablePublications(archive=p3a)
+        source, binary, index = self._setUpExpirablePublications(archive=p3a)
         script = self.getScript()
         script.always_expire = ["%s/%s" % (p3a.owner.name, p3a.name)]
         switch_dbuser(self.dbuser)
         script.main()
         self.assertSourceExpired(source)
         self.assertBinaryExpired(binary)
+        self.assertIndexExpired(index)
 
     def testPrivatePPAsNotExpired(self):
         """Test that private PPAs are not expired."""
         self.archive.private = True
-        source, binary = self._setUpExpirablePublications()
+        source, binary, index = self._setUpExpirablePublications()
         self.runScript()
         self.assertSourceNotExpired(source)
         self.assertBinaryNotExpired(binary)
+        self.assertIndexNotExpired(index)
 
 
 class TestPartnerExpiry(ArchiveExpiryTestBase, ArchiveExpiryCommonTests):
diff --git a/lib/lp/soyuz/tests/test_archivefile.py b/lib/lp/soyuz/tests/test_archivefile.py
index 8a7ea20..c4e2a57 100644
--- a/lib/lp/soyuz/tests/test_archivefile.py
+++ b/lib/lp/soyuz/tests/test_archivefile.py
@@ -251,22 +251,19 @@ class TestArchiveFile(TestCaseWithFactory):
             ),
         )
 
-    def test_delete(self):
+    def test_markDeleted(self):
         archive = self.factory.makeArchive()
         archive_files = [
             self.factory.makeArchiveFile(archive=archive) for _ in range(4)
         ]
-        expected_rows = [
-            (
-                archive_file.container,
-                archive_file.path,
-                archive_file.library_file.content.sha256,
-            )
-            for archive_file in archive_files[:2]
-        ]
         archive_file_set = getUtility(IArchiveFileSet)
-        rows = archive_file_set.delete(archive_files[:2])
-        self.assertContentEqual(expected_rows, rows)
+        archive_file_set.markDeleted(archive_files[:2])
+        flush_database_caches()
+        self.assertIsNotNone(archive_files[0].date_removed)
+        self.assertIsNotNone(archive_files[1].date_removed)
+        self.assertIsNone(archive_files[2].date_removed)
+        self.assertIsNone(archive_files[3].date_removed)
         self.assertContentEqual(
-            archive_files[2:], archive_file_set.getByArchive(archive)
+            archive_files[2:],
+            archive_file_set.getByArchive(archive, only_published=True),
         )
diff --git a/lib/lp/testing/factory.py b/lib/lp/testing/factory.py
index 9a672aa..a7cadaa 100644
--- a/lib/lp/testing/factory.py
+++ b/lib/lp/testing/factory.py
@@ -3796,6 +3796,7 @@ class LaunchpadObjectFactory(ObjectFactory):
         path=None,
         library_file=None,
         scheduled_deletion_date=None,
+        date_removed=None,
     ):
         if archive is None:
             archive = self.makeArchive()
@@ -3815,6 +3816,8 @@ class LaunchpadObjectFactory(ObjectFactory):
             removeSecurityProxy(
                 archive_file
             ).scheduled_deletion_date = scheduled_deletion_date
+        if date_removed is not None:
+            removeSecurityProxy(archive_file).date_removed = date_removed
         return archive_file
 
     def makeBuilder(