← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-181368-view into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-181368-view into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers): code
Related bugs:
  #181368 cron.daily explodes in non-obvious ways if drescher is under load
  https://bugs.launchpad.net/bugs/181368

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-181368-view/+merge/48609

= Gut BinaryPackageFilePublishing view =

This is an added bonus for the work on bug 181368.  The main area for performance improvement to the archive publisher is in parallel runs of apt-ftparchive, but another several minutes are spent querying package publishings.  I assume the binary package publishings to be the main time sink since they're generally much more numerous than the source packages, and harder to query.

That querying currently happens through the BinaryPackageFilePublishing view.  It's a view on a join of 10 tables.  Most uses of this view, shockingly, only care about less than half of those tables.  And those are clustered around the most selective parts of the query at that.  Getting rid of the view should allow us to issue more efficient queries.

William Grant has been trying to remove this view, together with its source-package sibling.  Getting rid of them is not as easy as it looks, since all sorts of polymorphism goes on in that family.  For this reason I had to leave one BinaryPackageFilePublishing query in for another day.  Someday I hope we can replace the whole construct with something more like a Collection—but this is an optimization branch, not a cleanup.

(Well, I did include a few small cleanups that may help future readers understand what various pieces of code do.  I didn't make it perfect, but hope it helps.)

You'll note that I then stripped off what fields I could from the Python representation of BinaryPackageFilePublishing without breaking tests.  That may help us get a handle on what exactly the objects do and don't do for us.  If it weren't for the source package name field, for instance, we could drop 4 tables out of the join: BinaryPackageRelease, BinaryPackageBuild, SourcePackageRelease, and SourcePackageName.  The first of those may be quite costly because it's so central to the join (without it we could join BinaryPackageFile straight to BinaryPackagePublishingHistory on their binarypackagerelease foreign keys).  The other three form a long chain of 1:1 foreign keys, which it would also be good not to have.

To test, best cast a wide net:
{{{
./bin/test -vvc lp.soyuz
./bin/test -vvc lp.archivepublisher
./bin/test -vvc lp.registry -t doc/distribution.txt
}}}

I didn't actually touch the tests since no functionality should be affected.  The affected methods are tested both directly and indirectly by the existing tests.

For Q/A, we'll have to verify that publishing still works and produces the expected files.

I cleaned up lint where I found it, but left one tiny bit intact:
{{{
./lib/lp/registry/model/distribution.py
     476: E202 whitespace before ')'
}}}

Some creative formatting there that I couldn't bring myself to disturb.


Jeroen
-- 
https://code.launchpad.net/~jtv/launchpad/bug-181368-view/+merge/48609
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-181368-view into lp:launchpad.
=== added directory 'lib/lp/archivepublisher/model'
=== added file 'lib/lp/archivepublisher/model/__init__.py'
=== renamed file 'lib/lp/archivepublisher/ftparchive.py' => 'lib/lp/archivepublisher/model/ftparchive.py'
--- lib/lp/archivepublisher/ftparchive.py	2011-02-01 16:14:06 +0000
+++ lib/lp/archivepublisher/model/ftparchive.py	2011-02-04 13:25:47 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 import os
@@ -14,6 +14,7 @@
 from canonical.launchpad.components.decoratedresultset import (
     DecoratedResultSet,
     )
+from canonical.launchpad.database.librarian import LibraryFileAlias
 from canonical.launchpad.webapp.interfaces import (
     DEFAULT_FLAVOR,
     IStoreSelector,
@@ -27,9 +28,16 @@
     OutputLineHandler,
     ReturnCodeReceiver,
     )
+from lp.services.database.stormexpr import Concatenate
 from lp.soyuz.enums import PackagePublishingStatus
+from lp.soyuz.model.binarypackagebuild import BinaryPackageBuild
+from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
 from lp.soyuz.model.component import Component
+from lp.soyuz.model.distroarchseries import DistroArchSeries
+from lp.soyuz.model.files import BinaryPackageFile
+from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
 from lp.soyuz.model.section import Section
+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
 
 
 def package_name(filename):
@@ -235,7 +243,6 @@
         """
         # Avoid cicular imports.
         from lp.soyuz.model.publishing import SourcePackagePublishingHistory
-        from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
 
         store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
         origins = (
@@ -284,8 +291,6 @@
         """
         # Avoid cicular imports.
         from lp.soyuz.model.binarypackagename import BinaryPackageName
-        from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
-        from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
 
         store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
         origins = (
@@ -565,31 +570,47 @@
         :return: a `DecoratedResultSet` with the binary files information
             tuples.
         """
-        # Avoid circular imports.
-        from lp.soyuz.model.publishing import BinaryPackageFilePublishing
-
         store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
-        result_set = store.using(BinaryPackageFilePublishing).find(
-            (BinaryPackageFilePublishing.sourcepackagename,
-             BinaryPackageFilePublishing.libraryfilealiasfilename,
-             BinaryPackageFilePublishing.componentname,
-             BinaryPackageFilePublishing.architecturetag),
-            BinaryPackageFilePublishing.distribution == self.distro,
-            BinaryPackageFilePublishing.archive == self.publisher.archive,
-            BinaryPackageFilePublishing.distroseriesname == distroseries.name,
-            BinaryPackageFilePublishing.pocket == pocket,
-            BinaryPackageFilePublishing.publishingstatus ==
-                PackagePublishingStatus.PUBLISHED)
+
+        columns = (
+            SourcePackageName.name,
+            LibraryFileAlias.filename,
+            Component.name,
+            Concatenate("binary-", DistroArchSeries.architecturetag),
+            )
+        join_conditions = [
+            BinaryPackageRelease.id ==
+                BinaryPackagePublishingHistory.binarypackagereleaseID,
+            BinaryPackageFile.binarypackagereleaseID ==
+                BinaryPackagePublishingHistory.binarypackagereleaseID,
+            BinaryPackageBuild.id == BinaryPackageRelease.buildID,
+            SourcePackageRelease.id ==
+                BinaryPackageBuild.source_package_release_id,
+            SourcePackageName.id == SourcePackageRelease.sourcepackagenameID,
+            LibraryFileAlias.id == BinaryPackageFile.libraryfileID,
+            DistroArchSeries.id ==
+                BinaryPackagePublishingHistory.distroarchseriesID,
+            Component.id == BinaryPackagePublishingHistory.componentID,
+            ]
+        select_conditions = [
+            BinaryPackagePublishingHistory.dateremoved == None,
+            DistroArchSeries.distroseriesID == distroseries.id,
+            BinaryPackagePublishingHistory.archive == self.publisher.archive,
+            BinaryPackagePublishingHistory.pocket == pocket,
+            BinaryPackagePublishingHistory.status ==
+                PackagePublishingStatus.PUBLISHED,
+            ]
 
         suite = distroseries.getSuite(pocket)
 
         def add_suite(result):
-            name, filename, component, architecturetag = result
-            architecture = 'binary-' + architecturetag
+            name, filename, component, architecture = result
             return (name, suite, filename, component, architecture)
 
+        result_set = store.find(
+            columns, *(join_conditions + select_conditions))
         result_set.order_by(
-            Desc(BinaryPackageFilePublishing.id))
+            BinaryPackagePublishingHistory.id, BinaryPackageFile.id)
         return DecoratedResultSet(result_set, add_suite)
 
     def generateFileLists(self, fullpublish=False):
@@ -621,12 +642,10 @@
                             component, sourcepackagename, filename)
             this_file = filelist.setdefault(suite, {})
             this_file.setdefault(component, {})
-            if architecturetag:
-                this_file[component].setdefault(architecturetag, [])
-                this_file[component][architecturetag].append(ondiskname)
-            else:
-                this_file[component].setdefault('source', [])
-                this_file[component]['source'].append(ondiskname)
+            if architecturetag is None:
+                architecturetag = "source"
+            this_file[component].setdefault(architecturetag, [])
+            this_file[component][architecturetag].append(ondiskname)
 
         # Process huge iterations (more than 200K records) in batches.
         # See `PublishingTunableLoop`.
@@ -644,20 +663,23 @@
         process_in_batches(
             binaryfiles, update_binary_filelist, self.log)
 
-        for suite, components in filelist.items():
+        for suite, components in filelist.iteritems():
             self.log.debug("Writing file lists for %s" % suite)
-            for component, architectures in components.items():
-                for architecture, file_names in architectures.items():
+            series, pocket = self.distro.getDistroSeriesAndPocket(suite)
+            for component, architectures in components.iteritems():
+                for architecture, file_names in architectures.iteritems():
                     # XXX wgrant 2010-10-06: There must be a better place to
                     # do this.
-                    series, pocket = (
-                        self.distro.getDistroSeriesAndPocket(suite))
-                    if (architecture != 'source' and
-                        not series.getDistroArchSeries(
-                            architecture[7:]).enabled):
-                        continue
-                    self.writeFileList(architecture, file_names,
-                                       suite, component)
+                    if architecture == "source":
+                        enabled = True
+                    else:
+                        # The "[7:]" strips the "binary-" prefix off the
+                        # architecture names we get here.
+                        das = series.getDistroArchSeries(architecture[7:])
+                        enabled = das.enabled
+                    if enabled:
+                        self.writeFileList(
+                            architecture, file_names, suite, component)
 
     def writeFileList(self, arch, file_names, dr_pocketed, component):
         """Outputs a file list for a series and architecture.

=== modified file 'lib/lp/archivepublisher/publishing.py'
--- lib/lp/archivepublisher/publishing.py	2010-12-19 03:48:38 +0000
+++ lib/lp/archivepublisher/publishing.py	2011-02-04 13:25:47 +0000
@@ -25,7 +25,7 @@
     )
 from lp.archivepublisher.diskpool import DiskPool
 from lp.archivepublisher.domination import Dominator
-from lp.archivepublisher.ftparchive import FTPArchiveHandler
+from lp.archivepublisher.model.ftparchive import FTPArchiveHandler
 from lp.archivepublisher.htaccess import (
     htpasswd_credentials_for_archive,
     write_htaccess,

=== modified file 'lib/lp/archivepublisher/tests/test_ftparchive.py'
--- lib/lp/archivepublisher/tests/test_ftparchive.py	2010-12-20 07:51:34 +0000
+++ lib/lp/archivepublisher/tests/test_ftparchive.py	2011-02-04 13:25:47 +0000
@@ -19,7 +19,7 @@
 from canonical.testing.layers import LaunchpadZopelessLayer
 from lp.archivepublisher.config import getPubConfig
 from lp.archivepublisher.diskpool import DiskPool
-from lp.archivepublisher.ftparchive import (
+from lp.archivepublisher.model.ftparchive import (
     f_touch,
     FTPArchiveHandler,
     )

=== modified file 'lib/lp/archivepublisher/utils.py'
--- lib/lp/archivepublisher/utils.py	2010-08-25 11:01:59 +0000
+++ lib/lp/archivepublisher/utils.py	2011-02-04 13:25:47 +0000
@@ -74,13 +74,11 @@
 # 1 StupidCache + clear_current_connection_caches() [this];
 # 2 storm.Cache + clear_current_connection_caches() [no difference];
 # 3 StupidCache + store.invalidate(obj) [references left behind];
-# 4 stormCache + store.invlaidate(obj)  [references left behind];
+# 4 stormCache + store.invalidate(obj)  [references left behind];
 # 5 No batches [memory exhausted].
 
-# XXX cprov 20080630: If we decide to keep this code/functionality, which
-# I think we should, independently of the need to cleanup the cache after
-# processing each batch, we should generalize and test it as suggested in
-# bug #244328.
+# XXX JeroenVermeulen 2011-02-03 bug=244328: That was mid-2008.  We have
+# the GenerationalCache now.  We may not need any of this any more.
 
 class PublishingTunableLoop(object):
     """An `ITunableLoop` for dealing with huge publishing result sets."""

=== modified file 'lib/lp/registry/model/distribution.py'
--- lib/lp/registry/model/distribution.py	2011-01-20 21:06:23 +0000
+++ lib/lp/registry/model/distribution.py	2011-02-04 13:25:47 +0000
@@ -52,6 +52,7 @@
     Match,
     RANK,
     )
+from canonical.launchpad.database.librarian import LibraryFileAlias
 from canonical.launchpad.helpers import (
     ensure_unicode,
     shortlist,
@@ -169,6 +170,7 @@
     PackagePublishingStatus,
     PackageUploadStatus,
     )
+from lp.soyuz.model.files import BinaryPackageFile
 from lp.soyuz.interfaces.archive import (
     IArchiveSet,
     MAIN_ARCHIVE_PURPOSES,
@@ -193,7 +195,6 @@
     )
 from lp.soyuz.model.distroseriespackagecache import DistroSeriesPackageCache
 from lp.soyuz.model.publishing import (
-    BinaryPackageFilePublishing,
     BinaryPackagePublishingHistory,
     SourcePackageFilePublishing,
     SourcePackagePublishingHistory,
@@ -481,7 +482,7 @@
 
             source_mirrors = list(Store.of(self).find(
                 (MirrorDistroSeriesSource.distribution_mirrorID,
-                 Max(MirrorDistroSeriesSource.freshness)), 
+                 Max(MirrorDistroSeriesSource.freshness)),
                 MirrorDistroSeriesSource.distribution_mirrorID.is_in(
                     [mirror.id for mirror in mirrors])).group_by(
                         MirrorDistroSeriesSource.distribution_mirrorID))
@@ -492,8 +493,10 @@
 
             for mirror in mirrors:
                 cache = get_property_cache(mirror)
-                cache.arch_mirror_freshness = arch_mirror_freshness.get(mirror.id, None)
-                cache.source_mirror_freshness = source_mirror_freshness.get(mirror.id, None)
+                cache.arch_mirror_freshness = arch_mirror_freshness.get(
+                    mirror.id, None)
+                cache.source_mirror_freshness = source_mirror_freshness.get(
+                    mirror.id, None)
         return mirrors
 
     @property
@@ -982,6 +985,25 @@
             DistroSeries.distribution == self,
             DistroSeries.status == status)
 
+    def _findPublishedBinaryFile(self, filename, archive):
+        """Find file of the given name for the given distro and archive.
+
+        :return: A `LibraryFileAlias`, or None.
+        """
+        search = Store.of(self).find(
+            LibraryFileAlias,
+            BinaryPackageFile.libraryfileID == LibraryFileAlias.id,
+            BinaryPackagePublishingHistory.binarypackagereleaseID ==
+                BinaryPackageFile.binarypackagereleaseID,
+            DistroArchSeries.id ==
+                BinaryPackagePublishingHistory.distroarchseriesID,
+            DistroSeries.id == DistroArchSeries.distroseriesID,
+            BinaryPackagePublishingHistory.dateremoved == None,
+            DistroSeries.distribution == self,
+            BinaryPackagePublishingHistory.archiveID == archive.id,
+            LibraryFileAlias.filename == filename)
+        return search.order_by(Desc(LibraryFileAlias.id)).first()
+
     def getFileByName(self, filename, archive=None, source=True, binary=True):
         """See `IDistribution`."""
         assert (source or binary), "searching in an explicitly empty " \
@@ -989,21 +1011,22 @@
         if archive is None:
             archive = self.main_archive
 
-        if source:
-            candidate = SourcePackageFilePublishing.selectFirstBy(
+        if binary:
+            candidate = self._findPublishedBinaryFile(filename, archive)
+        else:
+            candidate = None
+
+        if source and candidate is None:
+            spfp = SourcePackageFilePublishing.selectFirstBy(
                 distribution=self, libraryfilealiasfilename=filename,
                 archive=archive, orderBy=['id'])
-
-        if binary:
-            candidate = BinaryPackageFilePublishing.selectFirstBy(
-                distribution=self,
-                libraryfilealiasfilename=filename,
-                archive=archive, orderBy=["-id"])
-
-        if candidate is not None:
-            return candidate.libraryfilealias
-
-        raise NotFoundError(filename)
+            if spfp is not None:
+                candidate = spfp.libraryfilealias
+
+        if candidate is None:
+            raise NotFoundError(filename)
+        else:
+            return candidate
 
     def getBuildRecords(self, build_state=None, name=None, pocket=None,
                         arch_tag=None, user=None, binary_only=True):

=== modified file 'lib/lp/services/database/stormexpr.py'
--- lib/lp/services/database/stormexpr.py	2011-02-01 01:01:11 +0000
+++ lib/lp/services/database/stormexpr.py	2011-02-04 13:25:47 +0000
@@ -3,11 +3,13 @@
 
 __metaclass__ = type
 __all__ = [
+    'Concatenate',
     'CountDistinct',
     'Greatest',
     ]
 
 from storm.expr import (
+    BinaryOper,
     compile,
     EXPR,
     Expr,
@@ -15,17 +17,18 @@
     )
 
 
-# XXX wallyworld 2011-01-31 bug=710466:
-# We need to use a Postgres greatest() function call but Storm doesn't
-# support that yet.
 class Greatest(NamedFunc):
+    # XXX wallyworld 2011-01-31 bug=710466:
+    # We need to use a Postgres greatest() function call but Storm
+    # doesn't support that yet.
     __slots__ = ()
     name = "GREATEST"
 
 
-# XXX: wallyworld 2010-11-26 bug=675377:
-# storm's Count() implementation is broken for distinct with > 1 column
 class CountDistinct(Expr):
+    # XXX: wallyworld 2010-11-26 bug=675377:
+    # storm's Count() implementation is broken for distinct with > 1
+    # column.
 
     __slots__ = ("columns")
 
@@ -39,3 +42,9 @@
     col = compile(countselect.columns)
     state.pop()
     return "count(distinct(%s))" % col
+
+
+class Concatenate(BinaryOper):
+    """Storm operator for string concatenation."""
+    __slots__ = ()
+    oper = " || "

=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
--- lib/lp/soyuz/interfaces/publishing.py	2011-01-31 16:01:21 +0000
+++ lib/lp/soyuz/interfaces/publishing.py	2011-02-04 13:25:47 +0000
@@ -655,10 +655,6 @@
             title=_('Binary Package publishing record id'), required=True,
             readonly=True,
             )
-    architecturetag = TextLine(
-            title=_("Architecture tag. As per dpkg's use"), required=True,
-            readonly=True,
-            )
 
 
 class IBinaryPackagePublishingHistoryPublic(IPublishingView):

=== modified file 'lib/lp/soyuz/model/publishing.py'
--- lib/lp/soyuz/model/publishing.py	2011-01-25 08:00:44 +0000
+++ lib/lp/soyuz/model/publishing.py	2011-02-04 13:25:47 +0000
@@ -243,11 +243,6 @@
 
     implements(IBinaryPackageFilePublishing)
 
-    distribution = ForeignKey(dbName='distribution',
-                              foreignKey="Distribution",
-                              unique=False, notNull=True,
-                              immutable=True)
-
     binarypackagepublishing = ForeignKey(
         dbName='binarypackagepublishing',
         foreignKey='BinaryPackagePublishingHistory', immutable=True)
@@ -266,19 +261,6 @@
     sourcepackagename = StringCol(dbName='sourcepackagename', unique=False,
                                   notNull=True, immutable=True)
 
-    distroseriesname = StringCol(dbName='distroseriesname', unique=False,
-                                  notNull=True, immutable=True)
-
-    publishingstatus = EnumCol(dbName='publishingstatus', unique=False,
-                               notNull=True, immutable=True,
-                               schema=PackagePublishingStatus)
-
-    architecturetag = StringCol(dbName='architecturetag', unique=False,
-                                notNull=True, immutable=True)
-
-    pocket = EnumCol(dbName='pocket', unique=False,
-                     notNull=True, schema=PackagePublishingPocket)
-
     archive = ForeignKey(dbName="archive", foreignKey="Archive", notNull=True)
 
     @property