← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/post-857155 into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/post-857155 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #857155 in Launchpad itself: "Gina domination is slow"
  https://bugs.launchpad.net/launchpad/+bug/857155

For more details, see:
https://code.launchpad.net/~jtv/launchpad/post-857155/+merge/76733

= Summary =

After the main optimization for Gina domination, here's another smaller one that may also help.


== Proposed fix ==

Two changes:

 * Batch-fetch SPRs when dominating SPPHs (similar for BPR's/BPPRs).
 * Rename dominateRemovedSourceVersions to dominateSourceVersions.

The name change is because a publication could be dominated not just because its version has been superseded, but also because the publication for a version has been replaced with a newer one for the same version.


== Pre-implementation notes ==

Came up as an extra in discussion and review for bug 857155.


== Implementation details ==

I had to extend the interface for BPPH to be a bit more like SPPH (which is the general thrust anyway) to accommodate load_related.  I hope it's mostly self-explanatory; not much point in abstract explanations for a pair of very practical helpers.  The two traits classes hide differences between SPPH and BPPH.


== Tests ==

{{{
./bin/test -vvc lp.soyuz.scripts.tests.test_gina
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}


== Demo and Q/A ==

Will be combined with Q/A for the main 857155 branch.


= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
  lib/lp/soyuz/scripts/gina/dominate.py
  lib/lp/soyuz/model/publishing.py
  lib/lp/soyuz/interfaces/publishing.py
  lib/lp/archivepublisher/tests/test_dominator.py
-- 
https://code.launchpad.net/~jtv/launchpad/post-857155/+merge/76733
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/post-857155 into lp:launchpad.
=== modified file 'lib/lp/archivepublisher/domination.py'
--- lib/lp/archivepublisher/domination.py	2011-09-16 11:41:14 +0000
+++ lib/lp/archivepublisher/domination.py	2011-09-23 13:12:35 +0000
@@ -69,6 +69,7 @@
     )
 from canonical.launchpad.interfaces.lpstorm import IStore
 from lp.registry.model.sourcepackagename import SourcePackageName
+from lp.services.database.bulk import load_related
 from lp.soyuz.enums import (
     BinaryPackageFormat,
     PackagePublishingStatus,
@@ -109,6 +110,9 @@
     Used by `GeneralizedPublication` to hide the differences from
     `BinaryPackagePublishingHistory`.
     """
+    release_class = SourcePackageRelease
+    release_reference_name = 'sourcepackagereleaseID'
+
     @staticmethod
     def getPackageName(spph):
         """Return the name of this publication's source package."""
@@ -126,6 +130,9 @@
     Used by `GeneralizedPublication` to hide the differences from
     `SourcePackagePublishingHistory`.
     """
+    release_class = BinaryPackageRelease
+    release_reference_name = 'binarypackagereleaseID'
+
     @staticmethod
     def getPackageName(bpph):
         """Return the name of this publication's binary package."""
@@ -156,9 +163,15 @@
         return self.traits.getPackageName(pub)
 
     def getPackageVersion(self, pub):
-        """Obtain the version string for a publicaiton record."""
+        """Obtain the version string for a publication record."""
         return self.traits.getPackageRelease(pub).version
 
+    def load_releases(self, pubs):
+        """Load the releases associated with a series of publications."""
+        return load_related(
+            self.traits.release_class, pubs,
+            [self.traits.release_reference_name])
+
     def compare(self, pub1, pub2):
         """Compare publications by version.
 
@@ -217,6 +230,9 @@
         :param generalization: A `GeneralizedPublication` helper representing
             the kind of publications these are--source or binary.
         """
+        publications = list(publications)
+        generalization.load_releases(publications)
+
         # Go through publications from latest version to oldest.  This
         # makes it easy to figure out which release superseded which:
         # the dominant is always the oldest live release that is newer
@@ -224,6 +240,10 @@
         publications = sorted(
             publications, cmp=generalization.compare, reverse=True)
 
+        self.logger.debug(
+            "Package has %d live publication(s).  Live versions: %s",
+            len(publications), live_versions)
+
         current_dominant = None
         dominant_version = None
 
@@ -238,21 +258,26 @@
                 # superseded by a newer publication of the same version.
                 # Supersede it.
                 pub.supersede(current_dominant, logger=self.logger)
+                self.logger.debug2(
+                    "Superseding older publication for version %s.", version)
             elif version in live_versions:
                 # This publication stays active; if any publications
                 # that follow right after this are to be superseded,
                 # this is the release that they are superseded by.
                 current_dominant = pub
                 dominant_version = version
+                self.logger.debug2("Keeping version %s.", version)
             elif current_dominant is None:
                 # This publication is no longer live, but there is no
                 # newer version to supersede it either.  Therefore it
                 # must be deleted.
                 pub.requestDeletion(None)
+                self.logger.debug2("Deleting version %s.", version)
             else:
                 # This publication is superseded.  This is what we're
                 # here to do.
                 pub.supersede(current_dominant, logger=self.logger)
+                self.logger.debug2("Superseding version %s.", version)
 
     def _dominatePublications(self, pubs, generalization):
         """Perform dominations for the given publications.
@@ -512,13 +537,24 @@
         flush_database_updates()
 
     def findPublishedSourcePackageNames(self, distroseries, pocket):
-        """Find names of currently published source packages."""
+        """Find currently published source packages.
+
+        Returns an iterable of tuples: (name of source package, number of
+        publications in Published state).
+        """
+        # Avoid circular imports.
+        from lp.soyuz.model.publishing import SourcePackagePublishingHistory
+
+        looking_for = (
+            SourcePackageName.name,
+            Count(SourcePackagePublishingHistory.id),
+            )
         result = IStore(SourcePackageName).find(
-            SourcePackageName.name,
+            looking_for,
             join_spph_spr(),
             join_spr_spn(),
             self._composeActiveSourcePubsCondition(distroseries, pocket))
-        return result.config(distinct=True)
+        return result.group_by(SourcePackageName.name)
 
     def findPublishedSPPHs(self, distroseries, pocket, package_name):
         """Find currently published source publications for given package."""
@@ -538,8 +574,8 @@
             Desc(SourcePackageRelease.version),
             Desc(SourcePackagePublishingHistory.datecreated))
 
-    def dominateRemovedSourceVersions(self, distroseries, pocket,
-                                      package_name, live_versions):
+    def dominateSourceVersions(self, distroseries, pocket, package_name,
+                               live_versions):
         """Dominate source publications based on a set of "live" versions.
 
         Active publications for the "live" versions will remain active.  All

=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
--- lib/lp/archivepublisher/tests/test_dominator.py	2011-09-09 09:06:28 +0000
+++ lib/lp/archivepublisher/tests/test_dominator.py	2011-09-23 13:12:35 +0000
@@ -5,9 +5,10 @@
 
 __metaclass__ = type
 
-import apt_pkg
 import datetime
 from operator import attrgetter
+
+import apt_pkg
 from zope.security.proxy import removeSecurityProxy
 
 from canonical.database.sqlbase import flush_database_updates
@@ -279,6 +280,19 @@
             bpph.binarypackagerelease.version,
             GeneralizedPublication(is_source=False).getPackageVersion(bpph))
 
+    def test_load_releases_loads_sourcepackagerelease(self):
+        spph = self.factory.makeSourcePackagePublishingHistory()
+        self.assertContentEqual(
+            [spph.sourcepackagerelease],
+            GeneralizedPublication(is_source=True).load_releases([spph]))
+
+    def test_load_releases_loads_binarypackagerelease(self):
+        bpph = self.factory.makeBinaryPackagePublishingHistory(
+            binarypackagerelease=self.factory.makeBinaryPackageRelease())
+        self.assertContentEqual(
+            [bpph.binarypackagerelease],
+            GeneralizedPublication(is_source=False).load_releases([bpph]))
+
     def test_compare_sorts_versions(self):
         versions = [
             '1.1v2',
@@ -570,13 +584,13 @@
             self.assertEqual(
                 expected_supersededby, [pub.supersededby for pub in pubs])
 
-    def test_dominateRemovedSourceVersions_dominates_publications(self):
-        # dominateRemovedSourceVersions finds the publications for a
-        # package and calls dominatePackage on them.
+    def test_dominateSourceVersions_dominates_publications(self):
+        # dominateSourceVersions finds the publications for a package
+        # and calls dominatePackage on them.
         pubs = make_spphs_for_versions(self.factory, ['0.1', '0.2', '0.3'])
         package_name = pubs[0].sourcepackagerelease.sourcepackagename.name
 
-        self.makeDominator(pubs).dominateRemovedSourceVersions(
+        self.makeDominator(pubs).dominateSourceVersions(
             pubs[0].distroseries, pubs[0].pocket, package_name, ['0.2'])
         self.assertEqual([
                 PackagePublishingStatus.SUPERSEDED,
@@ -588,21 +602,21 @@
             [pubs[1].sourcepackagerelease, None, None],
             [pub.supersededby for pub in pubs])
 
-    def test_dominateRemovedSourceVersions_ignores_other_pockets(self):
-        # dominateRemovedSourceVersions ignores publications in other
-        # pockets than the one specified.
+    def test_dominateSourceVersions_ignores_other_pockets(self):
+        # dominateSourceVersions ignores publications in other pockets
+        # than the one specified.
         pubs = make_spphs_for_versions(self.factory, ['2.3', '2.4'])
         package_name = pubs[0].sourcepackagerelease.sourcepackagename.name
         removeSecurityProxy(pubs[0]).pocket = PackagePublishingPocket.UPDATES
         removeSecurityProxy(pubs[1]).pocket = PackagePublishingPocket.PROPOSED
-        self.makeDominator(pubs).dominateRemovedSourceVersions(
+        self.makeDominator(pubs).dominateSourceVersions(
             pubs[0].distroseries, pubs[0].pocket, package_name, ['2.3'])
         self.assertEqual(PackagePublishingStatus.PUBLISHED, pubs[1].status)
 
-    def test_dominateRemovedSourceVersions_ignores_other_packages(self):
+    def test_dominateSourceVersions_ignores_other_packages(self):
         pubs = make_spphs_for_versions(self.factory, ['1.0', '1.1'])
         other_package_name = self.factory.makeSourcePackageName().name
-        self.makeDominator(pubs).dominateRemovedSourceVersions(
+        self.makeDominator(pubs).dominateSourceVersions(
             pubs[0].distroseries, pubs[0].pocket, other_package_name, ['1.1'])
         self.assertEqual(PackagePublishingStatus.PUBLISHED, pubs[0].status)
 
@@ -611,7 +625,7 @@
             status=PackagePublishingStatus.PUBLISHED)
         dominator = self.makeDominator([spph])
         self.assertContentEqual(
-            [spph.sourcepackagerelease.sourcepackagename.name],
+            [(spph.sourcepackagerelease.sourcepackagename.name, 1)],
             dominator.findPublishedSourcePackageNames(
                 spph.distroseries, spph.pocket))
 
@@ -626,7 +640,7 @@
         published_spph = spphs[PackagePublishingStatus.PUBLISHED]
         dominator = self.makeDominator(spphs.values())
         self.assertContentEqual(
-            [published_spph.sourcepackagerelease.sourcepackagename.name],
+            [(published_spph.sourcepackagerelease.sourcepackagename.name, 1)],
             dominator.findPublishedSourcePackageNames(series, pocket))
 
     def test_findPublishedSourcePackageNames_ignores_other_archives(self):
@@ -660,21 +674,33 @@
             dominator.findPublishedSourcePackageNames(
                 spph.distroseries, PackagePublishingPocket.SECURITY))
 
-    def test_findPublishedSourcePackageNames_does_not_return_duplicates(self):
+    def test_findPublishedSourcePackageNames_counts_published_SPPHs(self):
         series = self.factory.makeDistroSeries()
         pocket = PackagePublishingPocket.RELEASE
-        package = self.factory.makeSourcePackageName()
+        spr = self.factory.makeSourcePackageRelease()
         spphs = [
             self.factory.makeSourcePackagePublishingHistory(
-                distroseries=series, archive=series.main_archive,
-                pocket=pocket, status=PackagePublishingStatus.PUBLISHED,
-                sourcepackagerelease=self.factory.makeSourcePackageRelease(
-                    sourcepackagename=package))
+                distroseries=series, sourcepackagerelease=spr, pocket=pocket,
+                status=PackagePublishingStatus.PUBLISHED)
             for counter in xrange(2)]
         dominator = self.makeDominator(spphs)
-        self.assertEqual(
-            [package.name],
-            list(dominator.findPublishedSourcePackageNames(series, pocket)))
+        [(name, publications)] = dominator.findPublishedSourcePackageNames(
+            series, pocket)
+        self.assertEqual(len(spphs), publications)
+
+    def test_findPublishedSourcePackageNames_counts_no_other_state(self):
+        series = self.factory.makeDistroSeries()
+        pocket = PackagePublishingPocket.RELEASE
+        spr = self.factory.makeSourcePackageRelease()
+        spphs = [
+            self.factory.makeSourcePackagePublishingHistory(
+                distroseries=series, sourcepackagerelease=spr, pocket=pocket,
+                status=status)
+            for status in PackagePublishingStatus.items]
+        dominator = self.makeDominator(spphs)
+        [(name, publications)] = dominator.findPublishedSourcePackageNames(
+            series, pocket)
+        self.assertEqual(1, publications)
 
     def test_findPublishedSPPHs_finds_published_SPPH(self):
         spph = self.factory.makeSourcePackagePublishingHistory(

=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
--- lib/lp/soyuz/interfaces/publishing.py	2011-09-21 01:41:08 +0000
+++ lib/lp/soyuz/interfaces/publishing.py	2011-09-23 13:12:35 +0000
@@ -381,7 +381,8 @@
             ))
     archive = exported(
         Reference(
-            Interface, # Really IArchive, see below.
+            # Really IArchive (fixed in _schema_circular_imports.py).
+            Interface,
             title=_('Archive ID'), required=True, readonly=True,
             ))
     supersededby = Int(
@@ -478,7 +479,9 @@
         "if one exists, or None.")
 
     ancestor = Reference(
-        Interface, # Really ISourcePackagePublishingHistory
+         # Really ISourcePackagePublishingHistory (fixed in
+         # _schema_circular_imports.py).
+        Interface,
         title=_('Ancestor'),
         description=_('The previous release of this source package.'),
         required=False, readonly=True)
@@ -511,7 +514,8 @@
             `IBinaryPackagePublishingHistory`.
         """
 
-    @operation_returns_collection_of(Interface) # Really IBuild, see below.
+    # Really IBuild (fixed in _schema_circular_imports.py)
+    @operation_returns_collection_of(Interface)
     @export_read_operation()
     def getBuilds():
         """Return a list of `IBuild` objects in this publishing context.
@@ -678,12 +682,16 @@
         title=_('The DB id for the binarypackagename.'),
         required=False, readonly=False)
     binarypackagename = Attribute('The binary package name being published')
-    binarypackagerelease = Int(
-            title=_('The binary package being published'), required=False,
-            readonly=False)
+    binarypackagereleaseID = Int(
+        title=_('The DB id for the binarypackagerelease.'),
+        required=False, readonly=False)
+    binarypackagerelease = Attribute(
+        "The binary package release being published")
     distroarchseries = exported(
         Reference(
-            Interface, #Really IDistroArchSeries, circular import fixed below.
+            # Really IDistroArchSeries (fixed in
+            #_schema_circular_imports.py).
+            Interface,
             title=_("Distro Arch Series"),
             description=_('The distroarchseries being published into'),
             required=False, readonly=False,
@@ -769,7 +777,8 @@
         exported_as="date_removed")
     archive = exported(
         Reference(
-            Interface, # Really IArchive, see below.
+            # Really IArchive (fixed in _schema_circular_imports.py).
+            Interface,
             title=_('Archive'),
             description=_("The context archive for this publication."),
             required=True, readonly=True,

=== modified file 'lib/lp/soyuz/model/publishing.py'
--- lib/lp/soyuz/model/publishing.py	2011-09-21 01:41:08 +0000
+++ lib/lp/soyuz/model/publishing.py	2011-09-23 13:12:35 +0000
@@ -114,9 +114,8 @@
 from lp.soyuz.scripts.changeoverride import ArchiveOverriderError
 
 
-# XXX cprov 2006-08-18: move it away, perhaps archivepublisher/pool.py
-
 def makePoolPath(source_name, component_name):
+    # XXX cprov 2006-08-18: move it away, perhaps archivepublisher/pool.py
     """Return the pool path for a given source name and component name."""
     from lp.archivepublisher.diskpool import poolify
     return os.path.join(
@@ -421,12 +420,12 @@
     """A source package release publishing record."""
     implements(ISourcePackagePublishingHistory)
 
-    sourcepackagename = ForeignKey(foreignKey='SourcePackageName',
-        dbName='sourcepackagename')
-    sourcepackagerelease = ForeignKey(foreignKey='SourcePackageRelease',
-        dbName='sourcepackagerelease')
-    distroseries = ForeignKey(foreignKey='DistroSeries',
-        dbName='distroseries')
+    sourcepackagename = ForeignKey(
+        foreignKey='SourcePackageName', dbName='sourcepackagename')
+    sourcepackagerelease = ForeignKey(
+        foreignKey='SourcePackageRelease', dbName='sourcepackagerelease')
+    distroseries = ForeignKey(
+        foreignKey='DistroSeries', dbName='distroseries')
     component = ForeignKey(foreignKey='Component', dbName='component')
     section = ForeignKey(foreignKey='Section', dbName='section')
     status = EnumCol(schema=PackagePublishingStatus)
@@ -685,7 +684,7 @@
         return self.distroseries.distribution.getSourcePackageRelease(
             self.supersededby)
 
-    # XXX: StevenK 2011-09-13 bug=848563: This can die when 
+    # XXX: StevenK 2011-09-13 bug=848563: This can die when
     # self.sourcepackagename is populated.
     @property
     def source_package_name(self):
@@ -908,12 +907,12 @@
 
     implements(IBinaryPackagePublishingHistory)
 
-    binarypackagename = ForeignKey(foreignKey='BinaryPackageName',
-        dbName='binarypackagename')
-    binarypackagerelease = ForeignKey(foreignKey='BinaryPackageRelease',
-                                      dbName='binarypackagerelease')
-    distroarchseries = ForeignKey(foreignKey='DistroArchSeries',
-                                   dbName='distroarchseries')
+    binarypackagename = ForeignKey(
+        foreignKey='BinaryPackageName', dbName='binarypackagename')
+    binarypackagerelease = ForeignKey(
+        foreignKey='BinaryPackageRelease', dbName='binarypackagerelease')
+    distroarchseries = ForeignKey(
+        foreignKey='DistroArchSeries', dbName='distroarchseries')
     component = ForeignKey(foreignKey='Component', dbName='component')
     section = ForeignKey(foreignKey='Section', dbName='section')
     priority = EnumCol(dbName='priority', schema=PackagePublishingPriority)
@@ -957,7 +956,7 @@
         """See `IBinaryPackagePublishingHistory`"""
         return self.distroarchseries.distroseries
 
-    # XXX: StevenK 2011-09-13 bug=848563: This can die when 
+    # XXX: StevenK 2011-09-13 bug=848563: This can die when
     # self.binarypackagename is populated.
     @property
     def binary_package_name(self):

=== modified file 'lib/lp/soyuz/scripts/gina/dominate.py'
--- lib/lp/soyuz/scripts/gina/dominate.py	2011-09-19 04:26:46 +0000
+++ lib/lp/soyuz/scripts/gina/dominate.py	2011-09-23 13:12:35 +0000
@@ -23,13 +23,34 @@
     # Dominate all packages published in the series.  This includes all
     # packages listed in the Sources file we imported, but also packages
     # that have been recently deleted.
-    package_names = dominator.findPublishedSourcePackageNames(series, pocket)
-    for package_name in package_names:
+    package_counts = dominator.findPublishedSourcePackageNames(series, pocket)
+    for package_name, pub_count in package_counts:
         entries = packages_map.src_map.get(package_name, [])
         live_versions = [
             entry['Version'] for entry in entries if 'Version' in entry]
 
-        dominator.dominateRemovedSourceVersions(
-            series, pocket, package_name, live_versions)
+        # Gina import just ensured that any live version in the Sources
+        # file has a Published publication.  So there should be at least
+        # as many Published publications as live versions.
+        if pub_count < len(live_versions):
+            logger.warn(
+                "Package %s has fewer live source publications (%s) than "
+                "live versions (%s).  The archive may be broken in some "
+                "way.",
+                package_name, pub_count, len(live_versions))
 
-        txn.commit()
+        # Domination can only turn Published publications into
+        # non-Published ones, and ensures that we end up with one
+        # Published publication per live version.  Thus, if there are as
+        # many Published publications as live versions, there is no
+        # domination to do.  We skip these as an optimization.  Without
+        # it, dominating a single Debian series takes hours.
+        if pub_count != len(live_versions):
+            logger.debug("Dominating %s.", package_name)
+            dominator.dominateSourceVersions(
+                series, pocket, package_name, live_versions)
+            txn.commit()
+        else:
+            logger.debug2(
+                "Skipping domination for %s: %d live version(s) and "
+                "publication(s).", package_name, pub_count)