← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~cjwatson/launchpad/unnecessary-publisher-work into lp:launchpad

 

Colin Watson has proposed merging lp:~cjwatson/launchpad/unnecessary-publisher-work into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/unnecessary-publisher-work/+merge/120356

== Summary ==

The publisher spends quite a while doing override and filelist generation.  Now, I haven't done any profiling of this, and I rather suspect that most of the time is spent in large database queries, so this doesn't count as serious performance work.  Still, I observed that the publisher does quite a few things in Python code that it doesn't need to do at all, and if we can remove unnecessary code, make it more readable, and probably make it slightly faster into the bargain, there seem like no downsides.

== Proposed fix ==

FTPArchiveHandler.publishOverrides and FTPArchiveHandler.publishFileLists are only ever called for a single suite (distroseries+pocket) at a time.  It is therefore pointless to go to special effort with DecoratedResultSets to include the suite in every row in the result set, and pointless to construct dicts with only a single key.

To construct large dictionaries, the code calls dict.setdefault a lot.  Python 2.5 introduced collections.defaultdict, which is faster for this kind of work and involves less repetitive code.

FTPArchiveHandler.generateOverrideForComponent uses a "bar = list(foo); bar.sort()" pattern, which is cumbersome compared to "bar = sorted(foo)".

== Tests ==

bin/test -vvc lp.archivepublisher.tests.test_ftparchive
-- 
https://code.launchpad.net/~cjwatson/launchpad/unnecessary-publisher-work/+merge/120356
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~cjwatson/launchpad/unnecessary-publisher-work into lp:launchpad.
=== modified file 'lib/lp/archivepublisher/model/ftparchive.py'
--- lib/lp/archivepublisher/model/ftparchive.py	2012-06-22 15:07:50 +0000
+++ lib/lp/archivepublisher/model/ftparchive.py	2012-08-20 09:47:25 +0000
@@ -1,6 +1,7 @@
 # Copyright 2009-2012 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
+from collections import defaultdict
 import os
 from StringIO import StringIO
 
@@ -18,7 +19,6 @@
     OutputLineHandler,
     ReturnCodeReceiver,
     )
-from lp.services.database.decoratedresultset import DecoratedResultSet
 from lp.services.database.stormexpr import Concatenate
 from lp.services.librarian.model import LibraryFileAlias
 from lp.services.osutils import write_file
@@ -240,8 +240,7 @@
         :param distroseries: target `IDistroSeries`
         :param pocket: target `PackagePublishingPocket`
 
-        :return: a `DecoratedResultSet` with the source override information
-            tuples
+        :return: a `ResultSet` with the source override information tuples
         """
         # Avoid cicular imports.
         from lp.soyuz.model.publishing import SourcePackagePublishingHistory
@@ -269,15 +268,7 @@
             SourcePackagePublishingHistory.status ==
                 PackagePublishingStatus.PUBLISHED)
 
-        suite = distroseries.getSuite(pocket)
-
-        def add_suite(result):
-            name, component, section = result
-            return (name, suite, component, section)
-
-        result_set.order_by(
-            Desc(SourcePackagePublishingHistory.id))
-        return DecoratedResultSet(result_set, add_suite)
+        return result_set.order_by(Desc(SourcePackagePublishingHistory.id))
 
     def getBinariesForOverrides(self, distroseries, pocket):
         """Fetch override information about all published binaries.
@@ -288,8 +279,7 @@
         :param distroseries: target `IDistroSeries`
         :param pocket: target `PackagePublishingPocket`
 
-        :return: a `DecoratedResultSet` with the binary override information
-            tuples
+        :return: a `ResultSet` with the binary override information tuples
         """
         # Avoid cicular imports.
         from lp.soyuz.model.binarypackagename import BinaryPackageName
@@ -323,15 +313,7 @@
             BinaryPackagePublishingHistory.status ==
                 PackagePublishingStatus.PUBLISHED)
 
-        suite = distroseries.getSuite(pocket)
-
-        def add_suite(result):
-            name, component, section, priority = result
-            return (name, suite, component, section, priority.title.lower())
-
-        result_set.order_by(
-            Desc(BinaryPackagePublishingHistory.id))
-        return DecoratedResultSet(result_set, add_suite)
+        return result_set.order_by(Desc(BinaryPackagePublishingHistory.id))
 
     def generateOverrides(self, fullpublish=False):
         """Collect packages that need overrides, and generate them."""
@@ -346,9 +328,11 @@
 
                 spphs = self.getSourcesForOverrides(distroseries, pocket)
                 bpphs = self.getBinariesForOverrides(distroseries, pocket)
-                self.publishOverrides(spphs, bpphs)
+                self.publishOverrides(
+                    distroseries.getSuite(pocket), spphs, bpphs)
 
-    def publishOverrides(self, source_publications, binary_publications):
+    def publishOverrides(self, suite,
+                         source_publications, binary_publications):
         """Output a set of override files for use in apt-ftparchive.
 
         Given the provided sourceoverrides and binaryoverrides, do the
@@ -368,11 +352,10 @@
         # This code is tested in soyuz-set-of-uploads, and in
         # test_ftparchive.
 
-        # overrides[distroseries][component][src/bin] = sets of tuples
-        overrides = {}
+        # overrides[component][src/bin] = sets of tuples
+        overrides = defaultdict(lambda: defaultdict(set))
 
-        def updateOverride(packagename, suite, component, section,
-                           priority=None):
+        def updateOverride(packagename, component, section, priority=None):
             """Generates and packs tuples of data required for overriding.
 
             If priority is provided, it's a binary tuple; otherwise,
@@ -387,24 +370,21 @@
             if component != DEFAULT_COMPONENT:
                 section = "%s/%s" % (component, section)
 
-            override = overrides.setdefault(suite, {})
-            suboverride = override.setdefault(component, {})
+            override = overrides[component]
             # We use sets in this structure to avoid generating
             # duplicated overrides. This issue is an outcome of the fact
             # that the PublishingHistory views select across all
             # architectures -- and therefore we have N binaries for N
             # archs.
-            suboverride.setdefault('src', set())
-            suboverride.setdefault('bin', set())
-            suboverride.setdefault('d-i', set())
             if priority:
+                priority = priority.title.lower()
                 # We pick up debian-installer packages here
                 if section.endswith("debian-installer"):
-                    suboverride['d-i'].add((packagename, priority, section))
+                    override['d-i'].add((packagename, priority, section))
                 else:
-                    suboverride['bin'].add((packagename, priority, section))
+                    override['bin'].add((packagename, priority, section))
             else:
-                suboverride['src'].add((packagename, section))
+                override['src'].add((packagename, section))
 
         # Process huge iterations (more than 200k records) in batches.
         # See `PublishingTunableLoop`.
@@ -419,22 +399,17 @@
             updateOverride(*pub)
 
         # Now generate the files on disk...
-        for distroseries in overrides:
-            for component in overrides[distroseries]:
-                self.log.debug("Generating overrides for %s/%s..." % (
-                    distroseries, component))
-                self.generateOverrideForComponent(overrides, distroseries,
-                                                  component)
+        for component in overrides:
+            self.log.debug("Generating overrides for %s/%s..." % (
+                suite, component))
+            self.generateOverrideForComponent(overrides, suite, component)
 
     def generateOverrideForComponent(self, overrides, distroseries,
                                      component):
         """Generates overrides for a specific component."""
-        src_overrides = list(overrides[distroseries][component]['src'])
-        src_overrides.sort()
-        bin_overrides = list(overrides[distroseries][component]['bin'])
-        bin_overrides.sort()
-        di_overrides = list(overrides[distroseries][component]['d-i'])
-        di_overrides.sort()
+        src_overrides = sorted(overrides[component]['src'])
+        bin_overrides = sorted(overrides[component]['bin'])
+        di_overrides = sorted(overrides[component]['d-i'])
 
         # Set up filepaths for the overrides we read
         extra_extra_overrides = os.path.join(self._config.miscroot,
@@ -521,8 +496,7 @@
         :param distroseries: target `IDistroSeries`
         :param pocket: target `PackagePublishingPocket`
 
-        :return: a `DecoratedResultSet` with the source files information
-            tuples.
+        :return: a `ResultSet` with the source files information tuples.
         """
 
         # Avoid circular imports.
@@ -540,15 +514,7 @@
             SourcePackageFilePublishing.publishingstatus ==
                 PackagePublishingStatus.PUBLISHED)
 
-        suite = distroseries.getSuite(pocket)
-
-        def add_suite(result):
-            name, filename, component = result
-            return (name, suite, filename, component)
-
-        result_set.order_by(
-            Desc(SourcePackageFilePublishing.id))
-        return DecoratedResultSet(result_set, add_suite)
+        return result_set.order_by(Desc(SourcePackageFilePublishing.id))
 
     def getBinaryFiles(self, distroseries, pocket):
         """Fetch publishing information about all published binary files.
@@ -559,8 +525,7 @@
         :param distroseries: target `IDistroSeries`
         :param pocket: target `PackagePublishingPocket`
 
-        :return: a `DecoratedResultSet` with the binary files information
-            tuples.
+        :return: a `ResultSet` with the binary files information tuples.
         """
         store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
 
@@ -593,17 +558,10 @@
                 PackagePublishingStatus.PUBLISHED,
             ]
 
-        suite = distroseries.getSuite(pocket)
-
-        def add_suite(result):
-            name, filename, component, architecture = result
-            return (name, suite, filename, component, architecture)
-
         result_set = store.find(
             columns, *(join_conditions + select_conditions))
-        result_set.order_by(
+        return result_set.order_by(
             BinaryPackagePublishingHistory.id, BinaryPackageFile.id)
-        return DecoratedResultSet(result_set, add_suite)
 
     def generateFileLists(self, fullpublish=False):
         """Collect currently published FilePublishings and write filelists."""
@@ -617,27 +575,24 @@
                         continue
                 spps = self.getSourceFiles(distroseries, pocket)
                 pps = self.getBinaryFiles(distroseries, pocket)
-                self.publishFileLists(spps, pps)
+                self.publishFileLists(distroseries.getSuite(pocket), spps, pps)
 
-    def publishFileLists(self, sourcefiles, binaryfiles):
+    def publishFileLists(self, suite, sourcefiles, binaryfiles):
         """Collate the set of source files and binary files provided and
         write out all the file list files for them.
 
         listroot/distroseries_component_source
         listroot/distroseries_component_binary-archname
         """
-        filelist = {}
+        filelist = defaultdict(lambda: defaultdict(list))
 
-        def updateFileList(sourcepackagename, suite, filename, component,
+        def updateFileList(sourcepackagename, filename, component,
                            architecturetag=None):
             ondiskname = self._diskpool.pathFor(
                             component, sourcepackagename, filename)
-            this_file = filelist.setdefault(suite, {})
-            this_file.setdefault(component, {})
             if architecturetag is None:
                 architecturetag = "source"
-            this_file[component].setdefault(architecturetag, [])
-            this_file[component][architecturetag].append(ondiskname)
+            filelist[component][architecturetag].append(ondiskname)
 
         # Process huge iterations (more than 200K records) in batches.
         # See `PublishingTunableLoop`.
@@ -651,23 +606,22 @@
         for file_details in binaryfiles:
             updateFileList(*file_details)
 
-        for suite, components in filelist.iteritems():
-            self.log.debug("Writing file lists for %s" % suite)
-            series, pocket = self.distro.getDistroSeriesAndPocket(suite)
-            for component, architectures in components.iteritems():
-                for architecture, file_names in architectures.iteritems():
-                    # XXX wgrant 2010-10-06: There must be a better place to
-                    # do this.
-                    if architecture == "source":
-                        enabled = True
-                    else:
-                        # The "[7:]" strips the "binary-" prefix off the
-                        # architecture names we get here.
-                        das = series.getDistroArchSeries(architecture[7:])
-                        enabled = das.enabled
-                    if enabled:
-                        self.writeFileList(
-                            architecture, file_names, suite, component)
+        self.log.debug("Writing file lists for %s" % suite)
+        series, pocket = self.distro.getDistroSeriesAndPocket(suite)
+        for component, architectures in filelist.iteritems():
+            for architecture, file_names in architectures.iteritems():
+                # XXX wgrant 2010-10-06: There must be a better place to do
+                # this.
+                if architecture == "source":
+                    enabled = True
+                else:
+                    # The "[7:]" strips the "binary-" prefix off the
+                    # architecture names we get here.
+                    das = series.getDistroArchSeries(architecture[7:])
+                    enabled = das.enabled
+                if enabled:
+                    self.writeFileList(
+                        architecture, file_names, suite, component)
 
     def writeFileList(self, arch, file_names, dr_pocketed, component):
         """Output file lists for a series and architecture.

=== modified file 'lib/lp/archivepublisher/tests/test_ftparchive.py'
--- lib/lp/archivepublisher/tests/test_ftparchive.py	2012-06-22 15:07:50 +0000
+++ lib/lp/archivepublisher/tests/test_ftparchive.py	2012-08-20 09:47:25 +0000
@@ -26,6 +26,7 @@
     BufferLogger,
     DevNullLogger,
     )
+from lp.soyuz.enums import PackagePublishingPriority
 from lp.testing import TestCaseWithFactory
 from lp.testing.dbuser import switch_dbuser
 from lp.testing.layers import (
@@ -137,23 +138,20 @@
             self._publisher)
 
     def _publishDefaultOverrides(self, fa, component):
-        source_overrides = FakeSelectResult(
-            [('foo', 'hoary-test', component, 'misc')])
+        source_overrides = FakeSelectResult([('foo', component, 'misc')])
         binary_overrides = FakeSelectResult(
-            [('foo', 'hoary-test', component, 'misc', 'extra')])
-        fa.publishOverrides(source_overrides, binary_overrides)
+            [('foo', component, 'misc', PackagePublishingPriority.EXTRA)])
+        fa.publishOverrides('hoary-test', source_overrides, binary_overrides)
 
     def _publishDefaultFileLists(self, fa, component):
-        source_files = FakeSelectResult(
-            [('foo', 'hoary-test', 'foo_1.dsc', component)])
+        source_files = FakeSelectResult([('foo', 'foo_1.dsc', component)])
         binary_files = FakeSelectResult(
-            [('foo', 'hoary-test', 'foo_1_i386.deb', component,
-             'binary-i386')])
-        fa.publishFileLists(source_files, binary_files)
+            [('foo', 'foo_1_i386.deb', component, 'binary-i386')])
+        fa.publishFileLists('hoary-test', source_files, binary_files)
 
     def test_getSourcesForOverrides(self):
         # getSourcesForOverrides returns a list of tuples containing:
-        # (sourcename, suite, component, section)
+        # (sourcename, component, section)
 
         # Reconfigure FTPArchiveHandler to retrieve sampledata overrides.
         fa = self._setUpFTPArchiveHandler()
@@ -165,21 +163,21 @@
             hoary, PackagePublishingPocket.RELEASE)
 
         # For the above query, we are depending on the sample data to
-        # contain seven rows of SourcePackagePublishghistory data.
+        # contain seven rows of SourcePackagePublishingHistory data.
         expectedSources = [
-            ('linux-source-2.6.15', 'hoary', 'main', 'base'),
-            ('libstdc++', 'hoary', 'main', 'base'),
-            ('cnews', 'hoary', 'universe', 'base'),
-            ('alsa-utils', 'hoary', 'main', 'base'),
-            ('pmount', 'hoary', 'main', 'editors'),
-            ('netapplet', 'hoary', 'main', 'web'),
-            ('evolution', 'hoary', 'main', 'editors'),
+            ('linux-source-2.6.15', 'main', 'base'),
+            ('libstdc++', 'main', 'base'),
+            ('cnews', 'universe', 'base'),
+            ('alsa-utils', 'main', 'base'),
+            ('pmount', 'main', 'editors'),
+            ('netapplet', 'main', 'web'),
+            ('evolution', 'main', 'editors'),
             ]
         self.assertEqual(expectedSources, list(published_sources))
 
     def test_getBinariesForOverrides(self):
         # getBinariesForOverrides returns a list of tuples containing:
-        # (sourcename, suite, component, section, priority)
+        # (sourcename, component, section, priority)
 
         # Reconfigure FTPArchiveHandler to retrieve sampledata overrides.
         fa = self._setUpFTPArchiveHandler()
@@ -190,8 +188,9 @@
         published_binaries = fa.getBinariesForOverrides(
             hoary, PackagePublishingPocket.RELEASE)
         expectedBinaries = [
-            ('pmount', 'hoary', 'main', 'base', 'extra'),
-            ('pmount', 'hoary', 'universe', 'editors', 'important'),
+            ('pmount', 'main', 'base', PackagePublishingPriority.EXTRA),
+            ('pmount', 'universe', 'editors',
+             PackagePublishingPriority.IMPORTANT),
             ]
         self.assertEqual(expectedBinaries, list(published_binaries))
 
@@ -236,7 +235,7 @@
 
     def test_getSourceFiles(self):
         # getSourceFiles returns a list of tuples containing:
-        # (sourcename, suite, filename, component)
+        # (sourcename, filename, component)
 
         # Reconfigure FTPArchiveHandler to retrieve sampledata records.
         fa = self._setUpFTPArchiveHandler()
@@ -248,15 +247,15 @@
         sources_files = fa.getSourceFiles(
             hoary, PackagePublishingPocket.RELEASE)
         expected_files = [
-            ('alsa-utils', 'hoary', 'alsa-utils_1.0.9a-4ubuntu1.dsc', 'main'),
-            ('evolution', 'hoary', 'evolution-1.0.tar.gz', 'main'),
-            ('netapplet', 'hoary', 'netapplet_1.0.0.orig.tar.gz', 'main'),
+            ('alsa-utils', 'alsa-utils_1.0.9a-4ubuntu1.dsc', 'main'),
+            ('evolution', 'evolution-1.0.tar.gz', 'main'),
+            ('netapplet', 'netapplet_1.0.0.orig.tar.gz', 'main'),
             ]
         self.assertEqual(expected_files, list(sources_files))
 
     def test_getBinaryFiles(self):
         # getBinaryFiles returns a list of tuples containing:
-        # (sourcename, suite, filename, component, architecture)
+        # (sourcename, filename, component, architecture)
 
         # Reconfigure FTPArchiveHandler to retrieve sampledata records.
         fa = self._setUpFTPArchiveHandler()
@@ -268,7 +267,7 @@
         binary_files = fa.getBinaryFiles(
             hoary, PackagePublishingPocket.RELEASE)
         expected_files = [
-            ('pmount', 'hoary', 'pmount_1.9-1_all.deb', 'main', 'binary-hppa'),
+            ('pmount', 'pmount_1.9-1_all.deb', 'main', 'binary-hppa'),
             ]
         self.assertEqual(expected_files, list(binary_files))
 


Follow ups