← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~cjwatson/launchpad:removed-archive-files-not-reapable into launchpad:master

 

Colin Watson has proposed merging ~cjwatson/launchpad:removed-archive-files-not-reapable into launchpad:master.

Commit message:
Don't consider removed ArchiveFiles as eligible for reaping

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/440415

Commit 559bbefbb5d7327debd5343c1e812787f5d0caad introduced a gradual performance regression for PPA publishing.  In order to support serving old snapshots, `ArchiveFile` rows now have their `date_removed` column set rather than being deleted from the database.  However, I failed to account for this in the code that decides which PPAs to publish due to pending deletions and which suites in each PPA should have `by-hash` files pruned from them.  As a result, the PPA publisher has been getting gradually slower as we accumulate more removed `ArchiveFile` rows.

Excluding removed rows makes the query for pending-publication PPAs substantially faster since it has an order of magnitude fewer `ArchiveFile` rows to sort, and it should mean that we spend less effort publishing the same PPAs over and over again.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:removed-archive-files-not-reapable into launchpad:master.
diff --git a/lib/lp/archivepublisher/tests/test_publisher.py b/lib/lp/archivepublisher/tests/test_publisher.py
index c7bbea7..ab3fd82 100644
--- a/lib/lp/archivepublisher/tests/test_publisher.py
+++ b/lib/lp/archivepublisher/tests/test_publisher.py
@@ -1499,6 +1499,8 @@ class TestPublisher(TestPublisherBase):
             archive_file
         ).scheduled_deletion_date = now - timedelta(hours=12)
         self.assertIn(archive, ubuntu.getPendingPublicationPPAs())
+        getUtility(IArchiveFileSet).markDeleted([archive_file])
+        self.assertNotIn(archive, ubuntu.getPendingPublicationPPAs())
 
     def testDirtySuitesArchive(self):
         # getPendingPublicationPPAs returns archives that have dirty_suites
diff --git a/lib/lp/registry/model/distribution.py b/lib/lp/registry/model/distribution.py
index 36f3543..e02fc3d 100644
--- a/lib/lp/registry/model/distribution.py
+++ b/lib/lp/registry/model/distribution.py
@@ -1969,6 +1969,7 @@ class Distribution(
                 Archive.distribution == self,
                 ArchiveFile.archive == Archive.id,
                 ArchiveFile.scheduled_deletion_date < UTC_NOW,
+                ArchiveFile.date_removed == None,
             )
             .order_by(Archive.id)
             .config(distinct=True)
diff --git a/lib/lp/soyuz/model/archivefile.py b/lib/lp/soyuz/model/archivefile.py
index 89576a4..f9bbe36 100644
--- a/lib/lp/soyuz/model/archivefile.py
+++ b/lib/lp/soyuz/model/archivefile.py
@@ -215,6 +215,7 @@ class ArchiveFileSet:
         clauses = [
             ArchiveFile.archive == archive,
             ArchiveFile.scheduled_deletion_date < _now(),
+            ArchiveFile.date_removed == None,
         ]
         if container_prefix is not None:
             clauses.append(ArchiveFile.container.startswith(container_prefix))
diff --git a/lib/lp/soyuz/tests/test_archivefile.py b/lib/lp/soyuz/tests/test_archivefile.py
index 13a89f3..85d006f 100644
--- a/lib/lp/soyuz/tests/test_archivefile.py
+++ b/lib/lp/soyuz/tests/test_archivefile.py
@@ -351,6 +351,10 @@ class TestArchiveFile(TestCaseWithFactory):
                 archive, container_prefix="release:"
             ),
         )
+        archive_file_set.markDeleted([archive_files[3]])
+        self.assertContentEqual(
+            ["release:foo"], archive_file_set.getContainersToReap(archive)
+        )
 
     def test_markDeleted(self):
         archive = self.factory.makeArchive()