← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jtv/launchpad/bug-830890 into lp:launchpad

 

Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-830890 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #830890 in Launchpad itself: "Gina should delete publications if removed from source archive"
  https://bugs.launchpad.net/launchpad/+bug/830890

For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-830890/+merge/72581

= Summary =

This will be a two-stage submit, as I haven't figured out how to force a branch dependency on "bzr lp-propose" yet.

Gina pulls (among other things) Sources lists from the Debian archive.  Every time a new package (version) appears in Debian's listings, we create a new SourcePackagePublishingHistory (or SPPH for short).  If we previously had an older version of the package mirrored in Launchpad, its SPPH is marked superseded.

In addition to this, we also need Gina to mark SPPHs deleted when they stop occurring in Debian's listings.


== Proposed fix ==

Run a single huge query doing the whole job at once.  It contains all of the current package names as literals.  I tested that against two hundred thousand rather long package names (we only have just over forty thousand in the database) and it worked well enough, so I didn't bother tweaking that further.  It triggers a sequential scan on SourcePackageName (SPN), but that's a very narrow table and the database should survive.

On staging, the query took 30 seconds from whatever state the cache happened to be in, and 20 seconds from a warm cache.  Given the scale of the problem, and especially the presence of a "not in" condition, that's not enormous.  (This test run involved retiring every single package from a recent Debian release, which is orders of magnitude more work than a realistic run ought to be).  Practice will have to tell us whether it needs further optimization.


== Pre-implementation notes ==

Julian walked me through it.  Testing is normally particularly hard for gina, which I mostly managed to sidestep.  I was tempted to clean up first, but was told that in this case a quick result was preferred.


== Implementation details ==

The new function I added to the script didn't fit anywhere, so I gave it a file of its own.  Perhaps this would be a good place to put the superseding of SPPHs by new SPRs as well.  But that's for later.

Since this kind of scripting is very sensitive to integration issues, I added a unit test that runs under realistic database privileges and with realistic data as far as this part of the script goes.  The Sources file supposedly downloaded from the Debian archive is fake.  I'm relying on the existing smoke tests to ensure that the script does not supply surprisingly wrong distribution or series names to the function; the new script code gets some full-integration exercise in the existing tests.


== Tests ==

{{{
./bin/test -vvc lp.soyuz.scripts.tests.test_gina
./bin/test -vvc lp.soyuz.tests.test_publishing
}}}


== Demo and Q/A ==

We'll have to see if we can run this on dogfood.  What we expect to see is large numbers of packages dropping out of Launchpad's mirrored Debian releases.


= Launchpad lint =

There's some pre-existing lint left.  I don't know what to do about the _pythonpath one; there were plans to abolish the other warning and so I didn't bother with it.


Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/soyuz/scripts/gina/packages.py
  lib/lp/soyuz/tests/test_publishing.py
  lib/lp/soyuz/scripts/gina/retire.py
  lib/lp/soyuz/model/publishing.py
  lib/lp/soyuz/scripts/gina/archive.py
  lib/lp/soyuz/interfaces/publishing.py
  scripts/gina.py
  lib/lp/soyuz/scripts/tests/test_gina.py

./lib/lp/soyuz/interfaces/publishing.py
     384: E261 at least two spaces before inline comment
     481: E261 at least two spaces before inline comment
     514: E261 at least two spaces before inline comment
     684: E261 at least two spaces before inline comment
     770: E261 at least two spaces before inline comment
./scripts/gina.py
      26: '_pythonpath' imported but unused
-- 
https://code.launchpad.net/~jtv/launchpad/bug-830890/+merge/72581
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-830890 into lp:launchpad.
=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
--- lib/lp/soyuz/interfaces/publishing.py	2011-07-22 11:12:23 +0000
+++ lib/lp/soyuz/interfaces/publishing.py	2011-08-23 14:25:41 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 # pylint: disable-msg=E0211,E0213
@@ -1164,6 +1164,19 @@
             `IBinaryPackagePublishingHistory`.
         """
 
+    def retireDeadSourcePublications(distroseries, live_packages, person,
+                                     rationale=None):
+        """Request deletion of source publications that are no longer alive.
+
+        :param distroseries: `DistroSeries` whose dead publications are
+            to be retired.
+        :param live_packages: Iterable of all source package names (as
+            strings) that are still alive in this distroseries.
+        :param person: The `IPerson` responsible for the removal.
+        :param rationale: Optional text describing why the publications
+            are retired.
+        """
+
     def getBuildStatusSummariesForSourceIdsAndArchive(source_ids, archive):
         """Return a summary of the build statuses for source publishing ids.
 

=== modified file 'lib/lp/soyuz/model/publishing.py'
--- lib/lp/soyuz/model/publishing.py	2011-07-28 18:37:47 +0000
+++ lib/lp/soyuz/model/publishing.py	2011-08-23 14:25:41 +0000
@@ -2016,6 +2016,51 @@
 
         return sources + binary_packages
 
+    def retireDeadSourcePublications(self, distroseries, live_packages,
+                                     person, rationale=None):
+        """See `IPublishingSet`."""
+        live_packages = frozenset(live_packages)
+        if len(live_packages) == 0:
+            # Guard against empty set, which SQL's "IN" does not like.
+            live_packages = [None]
+
+        store = IMasterStore(SourcePackagePublishingHistory)
+
+        params = sqlvalues(
+            active_status=active_publishing_status,
+            distroseries=distroseries,
+            live_packages=live_packages,
+            person=person,
+            rationale=rationale,
+            status_deleted=PackagePublishingStatus.DELETED,
+            utc_now=UTC_NOW,
+        )
+
+        # Inserting all package names for an entire distroseries into a
+        # query is a little obscene.  But it worked for 200,000 package
+        # names, without complaining, on a 32-bit laptop.  That's about
+        # 5 times more source package names than Launchpad currently
+        # knows about.  The hardest part was trying to view the postgres
+        # query log in "less," to see that the query really executed.
+        store.execute("""
+            UPDATE SourcePackagePublishingHistory AS SPPH
+            SET
+                status = %(status_deleted)s,
+                datesuperseded = %(utc_now)s,
+                removed_by = %(person)s,
+                removal_comment = %(rationale)s
+            FROM SourcePackageRelease AS SPR
+            WHERE
+                SPR.id = SPPH.sourcepackagerelease AND
+                SPPH.distroseries = %(distroseries)s AND
+                SPPH.status IN %(active_status)s AND
+                SPR.sourcepackagename NOT IN (
+                    SELECT id
+                    FROM SourcePackageName
+                    WHERE name IN %(live_packages)s
+                )
+            """ % params)
+
     def getNearestAncestor(
         self, package_name, archive, distroseries, pocket=None,
         status=None, binary=False):

=== modified file 'lib/lp/soyuz/scripts/gina/archive.py'
--- lib/lp/soyuz/scripts/gina/archive.py	2010-08-20 20:31:18 +0000
+++ lib/lp/soyuz/scripts/gina/archive.py	2011-08-23 14:25:41 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 """Archive pool classes.
@@ -196,7 +196,7 @@
 
             # Run over the source stanzas and store info in src_map. We
             # make just one source map (instead of one per architecture)
-            # because most of then are the same for all architectures,
+            # because most of them are the same for all architectures,
             # but we go over it to also cover source packages that only
             # compile for one architecture.
             sources = apt_pkg.ParseTagFile(info_set.srcfile)
@@ -206,19 +206,19 @@
                     src_tmp['Component'] = info_set.component
                     src_name = src_tmp['Package']
                 except KeyError:
-                    log.exception("Invalid Sources stanza in %s" %
-                                  info_set.sources_tagfile)
+                    log.exception(
+                        "Invalid Sources stanza in %s",
+                        info_set.sources_tagfile)
                     continue
                 self.src_map[src_name].append(src_tmp)
 
-            # Check if it's in source-only mode, if so, skip binary index
+            # Check if it's in source-only mode.  If so, skip binary index
             # mapping.
             if info_set.source_only:
                 continue
 
-            # Create a tmp map for binaries for one arch/component pair
-            if not self.bin_map.has_key(info_set.arch):
-                self.bin_map[info_set.arch] = {}
+            # Create a tmp map for binaries for one arch/component pair.
+            self.bin_map.setdefault(info_set.arch, {})
 
             tmpbin_map = self.bin_map[info_set.arch]
 
@@ -226,12 +226,13 @@
             while binaries.Step():
                 try:
                     bin_tmp = dict(binaries.Section)
-                    # The component isn't listed in the tagfile
+                    # The component isn't listed in the tagfile.
                     bin_tmp['Component'] = info_set.component
                     bin_name = bin_tmp['Package']
                 except KeyError:
-                    log.exception("Invalid Releases stanza in %s" %
-                                  info_set.binaries_tagfile)
+                    log.exception(
+                        "Invalid Releases stanza in %s",
+                        info_set.binaries_tagfile)
                     continue
                 tmpbin_map[bin_name] = bin_tmp
 
@@ -247,4 +248,3 @@
                                   info_set.difile)
                     continue
                 tmpbin_map[dibin_name] = dibin_tmp
-

=== modified file 'lib/lp/soyuz/scripts/gina/packages.py'
--- lib/lp/soyuz/scripts/gina/packages.py	2011-02-23 20:26:53 +0000
+++ lib/lp/soyuz/scripts/gina/packages.py	2011-08-23 14:25:41 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 # pylint: disable-msg=W0631
@@ -61,9 +61,8 @@
     "source": PackagePublishingPriority.EXTRA,
 }
 
-GPGALGOS = {}
-for item in GPGKeyAlgorithm.items:
-    GPGALGOS[item.value] = item.name
+GPGALGOS = dict((item.value, item.name) for item in GPGKeyAlgorithm.items)
+
 
 #
 # Helper functions
@@ -147,11 +146,11 @@
 
 
 def parse_person(val):
-    if "," in val:
-        # Some emails have ',' like "Adam C. Powell, IV
-        # <hazelsct@xxxxxxxxxx>". rfc822.parseaddr seems to do not
-        # handle this properly, so we munge them here
-        val = val.replace(',', '')
+    """Parse a full email address into human-readable name and address."""
+    # Some addresses have commas in them, as in: "Adam C. Powell, IV
+    # <hazelsct@xxxxxxxxxxxxxxxxxx>". rfc822.parseaddr seems not to
+    # handle this properly, so we munge them here.
+    val = val.replace(',', '')
     return rfc822.parseaddr(val)
 
 
@@ -199,6 +198,7 @@
     else:
         return None
 
+
 #
 # Exception classes
 #

=== added file 'lib/lp/soyuz/scripts/gina/retire.py'
--- lib/lp/soyuz/scripts/gina/retire.py	1970-01-01 00:00:00 +0000
+++ lib/lp/soyuz/scripts/gina/retire.py	2011-08-23 14:25:41 +0000
@@ -0,0 +1,24 @@
+# Copyright 2011 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Retirement of packages that are removed upstream."""
+
+__metaclass__ = type
+__all__ = [
+    'retire_dead_source_publications',
+    ]
+
+from zope.component import getUtility
+
+from lp.app.interfaces.launchpad import ILaunchpadCelebrities
+from lp.registry.interfaces.distribution import IDistributionSet
+from lp.soyuz.interfaces.publishing import IPublishingSet
+
+
+def retire_dead_source_publications(distro_name, series_name, packages_map):
+    """Retire any source packages that are no longer listed."""
+    katie = getUtility(ILaunchpadCelebrities).katie
+    series = getUtility(IDistributionSet)[distro_name].getSeries(series_name)
+    getUtility(IPublishingSet).retireDeadSourcePublications(
+        series, packages_map.src_map.iterkeys(), person=katie,
+        rationale="Package no longer seen in upstream sources list.")

=== modified file 'lib/lp/soyuz/scripts/tests/test_gina.py'
--- lib/lp/soyuz/scripts/tests/test_gina.py	2010-08-20 20:31:18 +0000
+++ lib/lp/soyuz/scripts/tests/test_gina.py	2011-08-23 14:25:41 +0000
@@ -1,13 +1,94 @@
-# Copyright 2009-2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 from doctest import DocTestSuite
-import unittest
-
+import os.path
+from unittest import TestLoader
+
+import transaction
+
+from canonical.config import config
+from canonical.testing.layers import LaunchpadZopelessLayer
+from lp.soyuz.enums import PackagePublishingStatus
+from lp.soyuz.scripts.gina.archive import PackagesMap
 import lp.soyuz.scripts.gina.handlers
+from lp.soyuz.scripts.gina.retire import retire_dead_source_publications
+from lp.testing import TestCaseWithFactory
+from lp.testing.fakemethod import FakeMethod
+
+
+class FakeListing:
+    """Fake, empty, Source or Packages listing."""
+    Step = FakeMethod(result=False)
+
+
+class FakeInfoSet:
+    """Fake packages listing for parsing by `apt_pkg`."""
+
+    def __init__(self, temp_dir, listing_contents):
+        """Set up a fake Sources listing.
+
+        :param temp_dir: A temporary directory to write a Sources file to.
+        :param listing_contents: Contents of the Sources file.
+        """
+        listing_name = self._makeSourcesFile(temp_dir, listing_contents)
+        self.srcfile = file(listing_name, 'r')
+        self.binfile = FakeListing()
+        self.dibinaries = FakeListing()
+
+    def _makeSourcesFile(self, dirname, contents):
+        """Create a Sources listing file for `apt_pkg` to parse.
+
+        Horribly, apt_pkg really wants to read this from a proper file.
+        Can't use StringIO.
+
+        :param dirname: Directory to create a Sources file in.
+        :param contents: Text to write to Sources file.
+        :return: Full path of new Sources file.
+        """
+        listing_name = os.path.join(dirname, 'Sources')
+        listing_output = file(listing_name, 'w')
+        listing_output.write(contents)
+        listing_output.close()
+        return listing_name
+
+    component = 'main'
+    source_only = True
+    arch = 'i386'
+
+
+class FakeArchComponentItems(list):
+    """Simple fake `ArchComponentItems` that `PackagesMap` can parse."""
+    cleanup = FakeMethod()
+
+
+class TestGina(TestCaseWithFactory):
+    """Unit tests for gina."""
+
+    layer = LaunchpadZopelessLayer
+
+    def test_retire_dead_source_publications(self):
+        series = self.factory.makeDistroSeries()
+        dead_spph = self.factory.makeSourcePackagePublishingHistory(series)
+        live_spph = self.factory.makeSourcePackagePublishingHistory(series)
+        packages_map = PackagesMap(FakeArchComponentItems([FakeInfoSet(
+            self.makeTemporaryDirectory(),
+            "Package: %s" % live_spph.sourcepackagerelease.name)]))
+
+        transaction.commit()
+        self.layer.switchDbUser(config.gina.dbuser)
+
+        retire_dead_source_publications(
+            series.distribution.name, series.name, packages_map)
+
+        transaction.commit()
+
+        self.assertEqual(PackagePublishingStatus.DELETED, dead_spph.status)
+        self.assertEqual(PackagePublishingStatus.PENDING, live_spph.status)
 
 
 def test_suite():
-    suite = unittest.TestSuite()
+    suite = TestLoader().loadTestsFromName(__name__)
+    # Run the doctests as well.
     suite.addTest(DocTestSuite(lp.soyuz.scripts.gina.handlers))
     return suite

=== modified file 'lib/lp/soyuz/tests/test_publishing.py'
--- lib/lp/soyuz/tests/test_publishing.py	2011-08-15 03:41:35 +0000
+++ lib/lp/soyuz/tests/test_publishing.py	2011-08-23 14:25:41 +0000
@@ -3,6 +3,7 @@
 
 """Test native publication workflow for Soyuz. """
 
+from collections import defaultdict
 import datetime
 import operator
 import os
@@ -25,6 +26,7 @@
     DatabaseFunctionalLayer,
     LaunchpadZopelessLayer,
     reconnect_stores,
+    ZopelessDatabaseLayer,
     )
 from lp.app.errors import NotFoundError
 from lp.archivepublisher.config import getPubConfig
@@ -48,6 +50,8 @@
 from lp.soyuz.interfaces.binarypackagename import IBinaryPackageNameSet
 from lp.soyuz.interfaces.component import IComponentSet
 from lp.soyuz.interfaces.publishing import (
+    active_publishing_status,
+    inactive_publishing_status,
     IPublishingSet,
     PackagePublishingPriority,
     PackagePublishingStatus,
@@ -1160,6 +1164,100 @@
         self.assertEqual(None, record)
 
 
+class TestPublishingSetLite(TestCaseWithFactory):
+    """Lightweight unit tests for `PublishingSet`."""
+
+    layer = ZopelessDatabaseLayer
+
+    def mapSPPHStatuses(self, spphs):
+        """Map out `SourcePackagePublishingHistory` by status.
+
+        :param spphs: Iterable of `SourcePackagePublishingHistory`.
+        :return: Dict mapping each status found in spphs to a list of the
+            members of spphs that have that status.
+        """
+        mapping = defaultdict(list)
+        for spph in spphs:
+            mapping[spph.status].append(spph)
+        return dict(mapping)
+
+    def test_retireDeadSourcePublications_retires_dead_package(self):
+        # retireDeadSourcePublication retires publications for packages
+        # that are not mentioned in live_packages.
+        spph = self.factory.makeSourcePackagePublishingHistory()
+        person = self.factory.makePerson()
+        rationale = self.factory.getUniqueString()
+        getUtility(IPublishingSet).retireDeadSourcePublications(
+            spph.distroseries, [], person, rationale=rationale)
+        transaction.commit()
+        self.assertEqual(PackagePublishingStatus.DELETED, spph.status)
+        self.assertNotEqual(None, spph.datesuperseded)
+        self.assertEqual(person, spph.removed_by)
+        self.assertEqual(rationale, spph.removal_comment)
+
+    def test_retireDeadSourcePublications_does_not_retire_live_package(self):
+        # retireDeadSourcePublication does not retire publications for
+        # packages that are mentioned in live_packages.
+        spph = self.factory.makeSourcePackagePublishingHistory()
+        getUtility(IPublishingSet).retireDeadSourcePublications(
+            spph.distroseries, [spph.sourcepackagerelease.name],
+            self.factory.makePerson())
+        transaction.commit()
+        self.assertEqual(PackagePublishingStatus.PENDING, spph.status)
+        self.assertEqual(None, spph.datesuperseded)
+        self.assertEqual(None, spph.removed_by)
+        self.assertEqual(None, spph.removal_comment)
+
+    def test_retireDeadSourcePublications_ignores_unknown_package(self):
+        # retireDeadSourcePublication does its work regardless of any
+        # live packages that may not occur in the distroseries at all.
+        spph = self.factory.makeSourcePackagePublishingHistory()
+        other_package = self.factory.makeSourcePackageName()
+        getUtility(IPublishingSet).retireDeadSourcePublications(
+            spph.distroseries, [other_package.name],
+            self.factory.makePerson())
+        transaction.commit()
+        self.assertEqual(PackagePublishingStatus.DELETED, spph.status)
+
+    def test_retireDeadSourcePublications_ignores_other_distroseries(self):
+        # retireDeadSourcePublication does not retire publications for
+        # other distroseries than the one it's been given.
+        spph = self.factory.makeSourcePackagePublishingHistory()
+        getUtility(IPublishingSet).retireDeadSourcePublications(
+            self.factory.makeDistroSeries(), [], self.factory.makePerson())
+        transaction.commit()
+        self.assertEqual(PackagePublishingStatus.PENDING, spph.status)
+
+    def test_retireDeadSourcePublications_ignores_inactive_pubs(self):
+        # retireDeadSourcePublication does not retire publications that
+        # were no longer active anyway.
+        distroseries = self.factory.makeDistroSeries()
+        spphs = [
+            self.factory.makeSourcePackagePublishingHistory(
+                distroseries, status=status)
+            for status in inactive_publishing_status]
+        old_statuses = self.mapSPPHStatuses(spphs)
+        getUtility(IPublishingSet).retireDeadSourcePublications(
+            distroseries, [], self.factory.makePerson())
+        transaction.commit()
+        self.assertEqual(old_statuses, self.mapSPPHStatuses(spphs))
+
+    def test_retireDeadSourcePublications_retires_active_pubs(self):
+        # retireDeadSourcePublication retires publications in any active
+        # state.
+        distroseries = self.factory.makeDistroSeries()
+        spphs = [
+            self.factory.makeSourcePackagePublishingHistory(
+                distroseries, status=status)
+            for status in active_publishing_status]
+        getUtility(IPublishingSet).retireDeadSourcePublications(
+            distroseries, [], self.factory.makePerson())
+        transaction.commit()
+        self.assertEqual(
+            {PackagePublishingStatus.DELETED: spphs},
+            self.mapSPPHStatuses(spphs))
+
+
 class TestSourceDomination(TestNativePublishingBase):
     """Test SourcePackagePublishingHistory.supersede() operates correctly."""
 

=== modified file 'scripts/gina.py'
--- scripts/gina.py	2010-11-26 22:44:00 +0000
+++ scripts/gina.py	2011-08-23 14:25:41 +0000
@@ -1,6 +1,6 @@
 #!/usr/bin/python -S
 #
-# Copyright 2009,2010 Canonical Ltd.  This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd.  This software is licensed under the
 # GNU Affero General Public License version 3 (see the file LICENSE).
 
 # This module uses relative imports.
@@ -19,33 +19,44 @@
 
 __metaclass__ = type
 
-import _pythonpath
-
-# Set to non-zero if you'd like to be warned every so often
-COUNTDOWN = 0
-
 import os
-import psycopg2
 import sys
 import time
 
+import _pythonpath
+import psycopg2
 from zope.component import getUtility
 
 from canonical import lp
 from canonical.config import config
 from canonical.launchpad.scripts import log
-
 from lp.services.scripts.base import LaunchpadCronScript
 from lp.soyuz.interfaces.component import IComponentSet
 from lp.soyuz.scripts.gina import ExecutionError
+from lp.soyuz.scripts.gina.archive import (
+    ArchiveComponentItems,
+    MangledArchiveError,
+    PackagesMap,
+    )
+from lp.soyuz.scripts.gina.handlers import (
+    DataSetupError,
+    ImporterHandler,
+    MultiplePackageReleaseError,
+    NoSourcePackageError,
+    )
 from lp.soyuz.scripts.gina.katie import Katie
-from lp.soyuz.scripts.gina.archive import (ArchiveComponentItems,
-    PackagesMap, MangledArchiveError)
-from lp.soyuz.scripts.gina.handlers import (ImporterHandler,
-    MultiplePackageReleaseError, NoSourcePackageError, DataSetupError)
-from lp.soyuz.scripts.gina.packages import (SourcePackageData,
-    BinaryPackageData, MissingRequiredArguments, DisplayNameDecodingError,
-    PoolFileNotFound, InvalidVersionError)
+from lp.soyuz.scripts.gina.packages import (
+    BinaryPackageData,
+    DisplayNameDecodingError,
+    InvalidVersionError,
+    MissingRequiredArguments,
+    PoolFileNotFound,
+    SourcePackageData,
+    )
+from lp.soyuz.scripts.gina.retire import retire_dead_source_publications
+
+# Set to non-zero if you'd like to be warned every so often
+COUNTDOWN = 0
 
 
 def _get_keyring(keyrings_root):
@@ -88,34 +99,34 @@
     LIBRPORT = config.librarian.upload_port
 
     log.info("")
-    log.info("=== Processing %s/%s/%s ===" % (distro, distroseries, pocket))
-    log.debug("Packages read from: %s" % package_root)
-    log.debug("Keyrings read from: %s" % keyrings_root)
-    log.info("Components to import: %s" % ", ".join(components))
+    log.info("=== Processing %s/%s/%s ===", distro, distroseries, pocket)
+    log.debug("Packages read from: %s", package_root)
+    log.debug("Keyrings read from: %s", keyrings_root)
+    log.info("Components to import: %s", ", ".join(components))
     if component_override is not None:
-        log.info("Override components to: %s" % component_override)
-    log.info("Architectures to import: %s" % ", ".join(archs))
-    log.debug("Launchpad database: %s" % LPDB)
-    log.debug("Launchpad database host: %s" % LPDB_HOST)
-    log.debug("Launchpad database user: %s" % LPDB_USER)
-    log.info("Katie database: %s" % KTDB)
-    log.info("SourcePackage Only: %s" % source_only)
-    log.info("SourcePackageName Only: %s" % spnames_only)
-    log.debug("Librarian: %s:%s" % (LIBRHOST, LIBRPORT))
-    log.info("Dry run: %s" % (dry_run))
+        log.info("Override components to: %s", component_override)
+    log.info("Architectures to import: %s", ", ".join(archs))
+    log.debug("Launchpad database: %s", LPDB)
+    log.debug("Launchpad database host: %s", LPDB_HOST)
+    log.debug("Launchpad database user: %s", LPDB_USER)
+    log.info("Katie database: %s", KTDB)
+    log.info("SourcePackage Only: %s", source_only)
+    log.info("SourcePackageName Only: %s", spnames_only)
+    log.debug("Librarian: %s:%s", LIBRHOST, LIBRPORT)
+    log.info("Dry run: %s", dry_run)
     log.info("")
 
-    if hasattr(PackagePublishingPocket, pocket.upper()):
-        pocket = getattr(PackagePublishingPocket, pocket.upper())
-    else:
-        log.error("Could not find a pocket schema for %s" % pocket)
+    if not hasattr(PackagePublishingPocket, pocket.upper()):
+        log.error("Could not find a pocket schema for %s", pocket)
         sys.exit(1)
 
+    pocket = getattr(PackagePublishingPocket, pocket.upper())
+
     if component_override:
         valid_components = [
             component.name for component in getUtility(IComponentSet)]
         if component_override not in valid_components:
-            log.error("Could not find component %s" % component_override)
+            log.error("Could not find component %s", component_override)
             sys.exit(1)
 
     kdb = None
@@ -130,16 +141,19 @@
             source_only)
     except MangledArchiveError:
         log.exception(
-            "Failed to analyze archive for %s" % pocket_distroseries)
+            "Failed to analyze archive for %s", pocket_distroseries)
         sys.exit(1)
 
     packages_map = PackagesMap(arch_component_items)
-    importer_handler = ImporterHandler(ztm, distro, distroseries,
-                                       dry_run, kdb, package_root, keyrings,
-                                       pocket, component_override)
-
-    import_sourcepackages(packages_map, kdb, package_root, keyrings,
-                          importer_handler)
+    importer_handler = ImporterHandler(
+        ztm, distro, distroseries, dry_run, kdb, package_root, keyrings,
+        pocket, component_override)
+
+    import_sourcepackages(
+        packages_map, kdb, package_root, keyrings, importer_handler)
+    importer_handler.commit()
+
+    retire_dead_source_publications(distro, distroseries, packages_map)
     importer_handler.commit()
 
     if source_only:
@@ -150,63 +164,63 @@
         try:
             importer_handler.ensure_archinfo(archtag)
         except DataSetupError:
-            log.exception("Database setup required for run on %s" % archtag)
+            log.exception("Database setup required for run on %s", archtag)
             sys.exit(1)
 
-    import_binarypackages(packages_map, kdb, package_root, keyrings,
-                          importer_handler)
+    import_binarypackages(
+        packages_map, kdb, package_root, keyrings, importer_handler)
     importer_handler.commit()
 
 
+def attempt_source_package_import(source, kdb, package_root, keyrings,
+                                  importer_handler):
+    """Attempt to import a source package, and handle typical errors."""
+    package_name = source.get("Package", "unknown")
+    try:
+        try:
+            do_one_sourcepackage(
+                source, kdb, package_root, keyrings, importer_handler)
+        except psycopg2.Error:
+            log.exception(
+                "Database error: unable to create SourcePackage for %s. "
+                "Retrying once..", package_name)
+            importer_handler.abort()
+            time.sleep(15)
+            do_one_sourcepackage(
+                source, kdb, package_root, keyrings, importer_handler)
+    except (
+        InvalidVersionError, MissingRequiredArguments,
+        DisplayNameDecodingError):
+        log.exception(
+            "Unable to create SourcePackageData for %s", package_name)
+    except (PoolFileNotFound, ExecutionError):
+        # Problems with katie db stuff of opening files
+        log.exception("Error processing package files for %s", package_name)
+    except psycopg2.Error:
+        log.exception(
+            "Database errors made me give up: unable to create "
+            "SourcePackage for %s", package_name)
+        importer_handler.abort()
+    except MultiplePackageReleaseError:
+        log.exception(
+            "Database duplication processing %s", package_name)
+
+
 def import_sourcepackages(packages_map, kdb, package_root,
                           keyrings, importer_handler):
     # Goes over src_map importing the sourcepackages packages.
     count = 0
     npacks = len(packages_map.src_map)
-    log.info('%i Source Packages to be imported' % npacks)
+    log.info('%i Source Packages to be imported', npacks)
 
     for list_source in sorted(
         packages_map.src_map.values(), key=lambda x: x[0].get("Package")):
         for source in list_source:
             count += 1
-            package_name = source.get("Package", "unknown")
-            try:
-                try:
-                    do_one_sourcepackage(
-                        source, kdb, package_root, keyrings, importer_handler)
-                except psycopg2.Error:
-                    log.exception(
-                        "Database error: unable to create SourcePackage "
-                        "for %s. Retrying once.." % package_name)
-                    importer_handler.abort()
-                    time.sleep(15)
-                    do_one_sourcepackage(
-                        source, kdb, package_root, keyrings, importer_handler)
-            except (
-                InvalidVersionError, MissingRequiredArguments,
-                DisplayNameDecodingError):
-                log.exception(
-                    "Unable to create SourcePackageData for %s" %
-                    package_name)
-                continue
-            except (PoolFileNotFound, ExecutionError):
-                # Problems with katie db stuff of opening files
-                log.exception(
-                    "Error processing package files for %s" % package_name)
-                continue
-            except psycopg2.Error:
-                log.exception(
-                    "Database errors made me give up: unable to create "
-                    "SourcePackage for %s" % package_name)
-                importer_handler.abort()
-                continue
-            except MultiplePackageReleaseError:
-                log.exception(
-                    "Database duplication processing %s" % package_name)
-                continue
-
-            if COUNTDOWN and count % COUNTDOWN == 0:
-                log.warn('%i/%i sourcepackages processed' % (count, npacks))
+            attempt_source_package_import(
+                source, kdb, package_root, keyrings, importer_handler)
+            if COUNTDOWN and (count % COUNTDOWN == 0):
+                log.warn('%i/%i sourcepackages processed', count, npacks)
 
 
 def do_one_sourcepackage(source, kdb, package_root, keyrings,
@@ -215,7 +229,7 @@
     if importer_handler.preimport_sourcecheck(source_data):
         # Don't bother reading package information if the source package
         # already exists in the database
-        log.info('%s already exists in the archive' % source_data.package)
+        log.info('%s already exists in the archive', source_data.package)
         return
     source_data.process_package(kdb, package_root, keyrings)
     source_data.ensure_complete(kdb)
@@ -231,8 +245,8 @@
     for archtag in packages_map.bin_map.keys():
         count = 0
         npacks = len(packages_map.bin_map[archtag])
-        log.info('%i Binary Packages to be imported for %s' %
-                 (npacks, archtag))
+        log.info(
+            '%i Binary Packages to be imported for %s', npacks, archtag)
         # Go over binarypackages importing them for this architecture
         for binary in sorted(packages_map.bin_map[archtag].values(),
                              key=lambda x: x.get("Package")):
@@ -243,44 +257,45 @@
                     do_one_binarypackage(binary, archtag, kdb, package_root,
                                          keyrings, importer_handler)
                 except psycopg2.Error:
-                    log.exception("Database errors when importing a "
-                                  "BinaryPackage for %s. Retrying once.."
-                                  % package_name)
+                    log.exception(
+                        "Database errors when importing a BinaryPackage "
+                        "for %s. Retrying once..", package_name)
                     importer_handler.abort()
                     time.sleep(15)
                     do_one_binarypackage(binary, archtag, kdb, package_root,
                                          keyrings, importer_handler)
             except (InvalidVersionError, MissingRequiredArguments):
-                log.exception("Unable to create BinaryPackageData for %s" %
-                              package_name)
+                log.exception(
+                    "Unable to create BinaryPackageData for %s", package_name)
                 continue
             except (PoolFileNotFound, ExecutionError):
                 # Problems with katie db stuff of opening files
-                log.exception("Error processing package files for %s" %
-                              package_name)
+                log.exception(
+                    "Error processing package files for %s", package_name)
                 continue
             except MultiplePackageReleaseError:
-                log.exception("Database duplication processing %s" %
-                              package_name)
+                log.exception(
+                    "Database duplication processing %s", package_name)
                 continue
             except psycopg2.Error:
-                log.exception("Database errors made me give up: unable to "
-                              "create BinaryPackage for %s" % package_name)
+                log.exception(
+                    "Database errors made me give up: unable to create "
+                    "BinaryPackage for %s", package_name)
                 importer_handler.abort()
                 continue
             except NoSourcePackageError:
-                log.exception("Failed to create Binary Package for %s" %
-                              package_name)
+                log.exception(
+                    "Failed to create Binary Package for %s", package_name)
                 nosource.append(binary)
                 continue
 
             if COUNTDOWN and count % COUNTDOWN == 0:
                 # XXX kiko 2005-10-23: untested
-                log.warn('%i/%i binary packages processed' % (count, npacks))
+                log.warn('%i/%i binary packages processed', count, npacks)
 
         if nosource:
             # XXX kiko 2005-10-23: untested
-            log.warn('%i source packages not found' % len(nosource))
+            log.warn('%i source packages not found', len(nosource))
             for pkg in nosource:
                 log.warn(pkg)
 
@@ -289,7 +304,7 @@
                          importer_handler):
     binary_data = BinaryPackageData(**binary)
     if importer_handler.preimport_binarycheck(archtag, binary_data):
-        log.info('%s already exists in the archive' % binary_data.package)
+        log.info('%s already exists in the archive', binary_data.package)
         return
     binary_data.process_package(kdb, package_root, keyrings)
     importer_handler.import_binarypackage(archtag, binary_data)