← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-884649-branch-3 into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-884649-branch-3 into lp:launchpad with lp:~jtv/launchpad/bug-884649-branch-2 as a prerequisite.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #884649 in Launchpad itself: "Ubuntu publisher is taking more than an hour to complete"
  https://bugs.launchpad.net/launchpad/+bug/884649

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-884649-branch-3/+merge/81136

= Summary =

Domination is slow.  A major reason is the two-pass domination algorithm needed for binary publications.


== Proposed fix ==

The first binary-domination pass touches only architecture-specific publications.  This pass is relatively cheap, per package dominated.

The second pass touches only architecture-independent publications.  This pass is expensive per package dominated.  It probably dominates the same number of packages as the first pass, but for most, does nothing.

So: keep track during the first pass of which packages have architecture-independent publications, and limit the second pass to just those.


== Pre-implementation notes ==

This was Julian's idea.


== Implementation details ==

I made the second pass iterate over the intersection of “packages with multiple live publications” and “packages that were found during the first pass to have architecture-independent live publications.”  This is because a package might conceivably have architecture-independent live publications in one architecture, but no live publications at all in another.

That's not really supposed to happen, which is to say it can be helpful to be prepared for the case but it's not worth optimizing for.


== Tests ==

All the high-level desired outcomes and integration are tested in scenario tests; those remain unchanged because this is a functionally neutral optimization.

I did add one helper function, which is short but easy to mess up and so it gets its own series of tests.

{{{
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}


== Demo and Q/A ==

Run the dominator.  It should be tons faster, but still dominate even architecture-independent binary publications.


= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/archivepublisher/tests/test_dominator.py
-- 
https://code.launchpad.net/~jtv/launchpad/bug-884649-branch-3/+merge/81136
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-884649-branch-3 into lp:launchpad.
=== modified file 'lib/lp/archivepublisher/domination.py'
--- lib/lp/archivepublisher/domination.py	2011-11-03 12:00:12 +0000
+++ lib/lp/archivepublisher/domination.py	2011-11-03 12:00:14 +0000
@@ -296,6 +296,11 @@
     return get_binary_versions([latest] + arch_specific_pubs + reprieved_pubs)
 
 
+def contains_arch_indep(bpphs):
+    """Are any of the publications among `bpphs` architecture-independent?"""
+    return any(not bpph.architecture_specific for bpph in bpphs)
+
+
 class Dominator:
     """Manage the process of marking packages as superseded.
 
@@ -568,6 +573,18 @@
         """
         generalization = GeneralizedPublication(is_source=False)
 
+        # Domination happens in two passes.  The first tries to
+        # supersede architecture-dependent publications; the second
+        # tries to supersede architecture-independent ones.  An
+        # architecture-independent pub is kept alive as long as any
+        # architecture-dependent pubs from the same source package build
+        # are still live for any architecture, because they may depend
+        # on the architecture-independent package.
+        # Thus we limit the second pass to those packages that have
+        # published, architecture-independent publications; anything
+        # else will have completed domination in the first pass.
+        packages_w_arch_indep = set()
+
         for distroarchseries in distroseries.architectures:
             self.logger.info(
                 "Performing domination across %s/%s (%s)",
@@ -583,20 +600,25 @@
                 assert len(pubs) > 0, "Dominating zero binaries!"
                 live_versions = find_live_binary_versions_pass_1(pubs)
                 self.dominatePackage(pubs, live_versions, generalization)
-
-        # We need to make a second pass to cover the cases where:
-        #  * arch-specific binaries were not all dominated before arch-all
-        #    ones that depend on them
-        #  * An arch-all turned into an arch-specific, or vice-versa
-        #  * A package is completely schizophrenic and changes all of
-        #    its binaries between arch-all and arch-any (apparently
-        #    occurs sometimes!)
+                if contains_arch_indep(pubs):
+                    packages_w_arch_indep.add(name)
+
+        packages_w_arch_indep = frozenset(packages_w_arch_indep)
+
+        # The second pass attempts to supersede arch-all publications of
+        # older versions, from source package releases that no longer
+        # have any active arch-specific publications that might depend
+        # on the arch-indep ones.
+        # (In maintaining this code, bear in mind that some or all of a
+        # source package's binary packages may switch between
+        # arch-specific and arch-indep between releases.)
         for distroarchseries in distroseries.architectures:
             self.logger.info("Finding binaries...(2nd pass)")
             bins = self.findBinariesForDomination(distroarchseries, pocket)
             sorted_packages = self._sortPackages(bins, generalization)
             self.logger.info("Dominating binaries...(2nd pass)")
-            for name, pubs in sorted_packages.iteritems():
+            for name in packages_w_arch_indep.intersection(sorted_packages):
+                pubs = sorted_packages[name]
                 self.logger.debug("Dominating %s" % name)
                 assert len(pubs) > 0, "Dominating zero binaries in 2nd pass!"
                 live_versions = find_live_binary_versions_pass_2(pubs)

=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
--- lib/lp/archivepublisher/tests/test_dominator.py	2011-11-03 12:00:12 +0000
+++ lib/lp/archivepublisher/tests/test_dominator.py	2011-11-03 12:00:14 +0000
@@ -15,6 +15,7 @@
 from canonical.database.sqlbase import flush_database_updates
 from canonical.testing.layers import ZopelessDatabaseLayer
 from lp.archivepublisher.domination import (
+    contains_arch_indep,
     Dominator,
     find_live_binary_versions_pass_1,
     find_live_binary_versions_pass_2,
@@ -1190,3 +1191,28 @@
         make_publications_arch_specific([dependent], True)
         self.assertEqual(
             ['1.2', '1.1'], find_live_binary_versions_pass_2(bpphs))
+
+
+class TestDominationHelpers(TestCaseWithFactory):
+    """Test lightweight helpers for the `Dominator`."""
+
+    layer = ZopelessDatabaseLayer
+
+    def test_contains_arch_indep_says_True_for_arch_indep(self):
+        bpphs = [self.factory.makeBinaryPackagePublishingHistory()]
+        make_publications_arch_specific(bpphs, False)
+        self.assertTrue(contains_arch_indep(bpphs))
+
+    def test_contains_arch_indep_says_False_for_arch_specific(self):
+        bpphs = [self.factory.makeBinaryPackagePublishingHistory()]
+        make_publications_arch_specific(bpphs, True)
+        self.assertFalse(contains_arch_indep(bpphs))
+
+    def test_contains_arch_indep_says_True_for_combination(self):
+        bpphs = make_bpphs_for_versions(self.factory, ['1.1', '1.0'])
+        make_publications_arch_specific(bpphs[:1], True)
+        make_publications_arch_specific(bpphs[1:], False)
+        self.assertTrue(contains_arch_indep(bpphs))
+
+    def test_contains_arch_indep_says_False_for_empty_list(self):
+        self.assertFalse(contains_arch_indep([]))


Follow ups