← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wgrant/launchpad/bug-654372-optimise-domination into lp:launchpad/devel

 

William Grant has proposed merging lp:~wgrant/launchpad/bug-654372-optimise-domination into lp:launchpad/devel.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #654372 Source domination is inefficient
  https://bugs.launchpad.net/bugs/654372


= Summary =
The source domination candidate selection process currently takes up to
three minutes per primary series in production. This is roughly 10% of
the total primary archive publisher time.

== Proposed fix ==
Binary domination has an optimisation which cuts it from minutes to less
than a second: it first calculates the set of names which have multiple
active publications and then considers only the publications matching
those names. The obvious fix is to extend this technique to cover source
domination too.

== Pre-implementation notes ==
None. This is just a cleanup of the binary candidate selection approach
and an extension of it to sources.

== Implementation details ==
The existing approach uses a temporary table and string-based SQL
queries. I've revised it to use Storm syntax and a subselect instead.

== Tests ==
This is just about impossible to test -- query counts should be
similar, but they'll be much faster.
lp.archivepublisher.tests.test_dominator has the primary existing tests
for this area.

== Demo and Q/A ==
The tests are reasonably good now, so should catch any regressions. We
may see the dogfood publisher become a few minutes faster.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/domination.py
-- 
https://code.launchpad.net/~wgrant/launchpad/bug-654372-optimise-domination/+merge/40854
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wgrant/launchpad/bug-654372-optimise-domination into lp:launchpad/devel.
=== modified file 'lib/lp/archivepublisher/domination.py'
--- lib/lp/archivepublisher/domination.py	2010-10-03 15:30:06 +0000
+++ lib/lp/archivepublisher/domination.py	2010-11-15 12:03:19 +0000
@@ -58,19 +58,24 @@
 import operator
 
 import apt_pkg
+from storm.expr import And, Count, Select
 
 from canonical.database.constants import UTC_NOW
 from canonical.database.sqlbase import (
     clear_current_connection_cache,
-    cursor,
     flush_database_updates,
     sqlvalues,
     )
+from canonical.launchpad.interfaces.lpstorm import IMasterStore
 from lp.archivepublisher import ELIGIBLE_DOMINATION_STATES
+from lp.registry.model.sourcepackagename import SourcePackageName
 from lp.soyuz.enums import (
     BinaryPackageFormat,
     PackagePublishingStatus,
     )
+from lp.soyuz.model.binarypackagename import BinaryPackageName
+from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
 
 
 def clear_cache():
@@ -287,60 +292,67 @@
             self.debug("Performing domination across %s/%s (%s)" % (
                 dr.name, pocket.title, distroarchseries.architecturetag))
 
-            # Here we go behind SQLObject's back to generate an assistance
-            # table which will seriously improve the performance of this
-            # part of the publisher.
-            # XXX: dsilvers 2006-02-04: It would be nice to not have to do
-            # this. Most of this methodology is stolen from person.py
-            # XXX: malcc 2006-08-03: This should go away when we shift to
-            # doing this one package at a time.
-            flush_database_updates()
-            cur = cursor()
-            cur.execute("""SELECT bpn.id AS name, count(bpn.id) AS count INTO
-                temporary table PubDomHelper FROM BinaryPackageRelease bpr,
-                BinaryPackageName bpn, BinaryPackagePublishingHistory
-                sbpph WHERE bpr.binarypackagename = bpn.id AND
-                sbpph.binarypackagerelease = bpr.id AND
-                sbpph.distroarchseries = %s AND sbpph.archive = %s AND
-                sbpph.status = %s AND sbpph.pocket = %s
-                GROUP BY bpn.id""" % sqlvalues(
-                distroarchseries, self.archive,
-                PackagePublishingStatus.PUBLISHED, pocket))
-
-            binaries = BinaryPackagePublishingHistory.select(
-                """
-                binarypackagepublishinghistory.distroarchseries = %s
-                AND binarypackagepublishinghistory.archive = %s
-                AND binarypackagepublishinghistory.pocket = %s
-                AND binarypackagepublishinghistory.status = %s AND
-                binarypackagepublishinghistory.binarypackagerelease =
-                    binarypackagerelease.id
-                AND binarypackagerelease.binpackageformat != %s
-                AND binarypackagerelease.binarypackagename IN (
-                    SELECT name FROM PubDomHelper WHERE count > 1)"""
-                % sqlvalues(distroarchseries, self.archive,
-                            pocket, PackagePublishingStatus.PUBLISHED,
-                            BinaryPackageFormat.DDEB),
-                clauseTables=['BinaryPackageRelease'])
-
+            bpph_location_clauses = And(
+                BinaryPackagePublishingHistory.status ==
+                    PackagePublishingStatus.PUBLISHED,
+                BinaryPackagePublishingHistory.distroarchseries ==
+                    distroarchseries,
+                BinaryPackagePublishingHistory.archive == self.archive,
+                BinaryPackagePublishingHistory.pocket == pocket,
+                )
+            candidate_binary_names = Select(
+                BinaryPackageName.id,
+                And(
+                    BinaryPackageRelease.binarypackagenameID ==
+                        BinaryPackageName.id,
+                    BinaryPackagePublishingHistory.binarypackagereleaseID ==
+                        BinaryPackageRelease.id,
+                    bpph_location_clauses,
+                ),
+                group_by=BinaryPackageName.id,
+                having=Count(BinaryPackagePublishingHistory.id) > 1)
+            binaries = IMasterStore(BinaryPackagePublishingHistory).find(
+                BinaryPackagePublishingHistory,
+                BinaryPackageRelease.id ==
+                    BinaryPackagePublishingHistory.binarypackagereleaseID,
+                BinaryPackageRelease.binarypackagenameID.is_in(
+                    candidate_binary_names),
+                BinaryPackageRelease.binpackageformat !=
+                    BinaryPackageFormat.DDEB,
+                bpph_location_clauses)
             self.debug("Dominating binaries...")
             self._dominatePublications(self._sortPackages(binaries, False))
             if do_clear_cache:
                 self.debug("Flushing SQLObject cache.")
                 clear_cache()
 
-            flush_database_updates()
-            cur.execute("DROP TABLE PubDomHelper")
-
-        if do_clear_cache:
-            self.debug("Flushing SQLObject cache.")
-            clear_cache()
-
         self.debug("Performing domination across %s/%s (Source)" %
                    (dr.name, pocket.title))
-        sources = SourcePackagePublishingHistory.selectBy(
-            distroseries=dr, archive=self.archive, pocket=pocket,
-            status=PackagePublishingStatus.PUBLISHED)
+        spph_location_clauses = And(
+            SourcePackagePublishingHistory.status ==
+                PackagePublishingStatus.PUBLISHED,
+            SourcePackagePublishingHistory.distroseries == dr,
+            SourcePackagePublishingHistory.archive == self.archive,
+            SourcePackagePublishingHistory.pocket == pocket,
+            )
+        candidate_source_names = Select(
+            SourcePackageName.id,
+            And(
+                SourcePackageRelease.sourcepackagenameID ==
+                    SourcePackageName.id,
+                SourcePackagePublishingHistory.sourcepackagereleaseID ==
+                    SourcePackageRelease.id,
+                spph_location_clauses,
+            ),
+            group_by=SourcePackageName.id,
+            having=Count(SourcePackagePublishingHistory.id) > 1)
+        sources = IMasterStore(SourcePackagePublishingHistory).find(
+            SourcePackagePublishingHistory,
+            SourcePackageRelease.id ==
+                SourcePackagePublishingHistory.sourcepackagereleaseID,
+            SourcePackageRelease.sourcepackagenameID.is_in(
+                candidate_source_names),
+            spph_location_clauses)
         self.debug("Dominating sources...")
         self._dominatePublications(self._sortPackages(sources))
         flush_database_updates()


Follow ups