launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #04738
[Merge] lp:~jtv/launchpad/bug-830890 into lp:launchpad
Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-830890 into lp:launchpad with lp:~jtv/launchpad/pre-830890 as a prerequisite.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
Bug #830890 in Launchpad itself: "Gina should delete publications if removed from source archive"
https://bugs.launchpad.net/launchpad/+bug/830890
For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-830890/+merge/72582
= Summary =
This will be a two-stage submit, as I haven't figured out how to force a branch dependency on "bzr lp-propose" yet.
Gina pulls (among other things) Sources lists from the Debian archive. Every time a new package (version) appears in Debian's listings, we create a new SourcePackagePublishingHistory (or SPPH for short). If we previously had an older version of the package mirrored in Launchpad, its SPPH is marked superseded.
In addition to this, we also need Gina to mark SPPHs deleted when they stop occurring in Debian's listings.
== Proposed fix ==
Run a single huge query doing the whole job at once. It contains all of the current package names as literals. I tested that against two hundred thousand rather long package names (we only have just over forty thousand in the database) and it worked well enough, so I didn't bother tweaking that further. It triggers a sequential scan on SourcePackageName (SPN), but that's a very narrow table and the database should survive.
On staging, the query took 30 seconds from whatever state the cache happened to be in, and 20 seconds from a warm cache. Given the scale of the problem, and especially the presence of a "not in" condition, that's not enormous. (This test run involved retiring every single package from a recent Debian release, which is orders of magnitude more work than a realistic run ought to be). Practice will have to tell us whether it needs further optimization.
== Pre-implementation notes ==
Julian walked me through it. Testing is normally particularly hard for gina, which I mostly managed to sidestep. I was tempted to clean up first, but was told that in this case a quick result was preferred.
== Implementation details ==
The new function I added to the script didn't fit anywhere, so I gave it a file of its own. Perhaps this would be a good place to put the superseding of SPPHs by new SPRs as well. But that's for later.
Since this kind of scripting is very sensitive to integration issues, I added a unit test that runs under realistic database privileges and with realistic data as far as this part of the script goes. The Sources file supposedly downloaded from the Debian archive is fake. I'm relying on the existing smoke tests to ensure that the script does not supply surprisingly wrong distribution or series names to the function; the new script code gets some full-integration exercise in the existing tests.
== Tests ==
{{{
./bin/test -vvc lp.soyuz.scripts.tests.test_gina
./bin/test -vvc lp.soyuz.tests.test_publishing
}}}
== Demo and Q/A ==
We'll have to see if we can run this on dogfood. What we expect to see is large numbers of packages dropping out of Launchpad's mirrored Debian releases.
= Launchpad lint =
There's some pre-existing lint left. I don't know what to do about the _pythonpath one; there were plans to abolish the other warning and so I didn't bother with it.
Checking for conflicts and issues in changed files.
Linting changed files:
lib/lp/soyuz/scripts/gina/packages.py
lib/lp/soyuz/tests/test_publishing.py
lib/lp/soyuz/scripts/gina/retire.py
lib/lp/soyuz/model/publishing.py
lib/lp/soyuz/scripts/gina/archive.py
lib/lp/soyuz/interfaces/publishing.py
scripts/gina.py
lib/lp/soyuz/scripts/tests/test_gina.py
./lib/lp/soyuz/interfaces/publishing.py
384: E261 at least two spaces before inline comment
481: E261 at least two spaces before inline comment
514: E261 at least two spaces before inline comment
684: E261 at least two spaces before inline comment
770: E261 at least two spaces before inline comment
./scripts/gina.py
26: '_pythonpath' imported but unused
--
https://code.launchpad.net/~jtv/launchpad/bug-830890/+merge/72582
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-830890 into lp:launchpad.
=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
--- lib/lp/soyuz/interfaces/publishing.py 2011-07-22 11:12:23 +0000
+++ lib/lp/soyuz/interfaces/publishing.py 2011-08-23 14:26:03 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd. This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
# GNU Affero General Public License version 3 (see the file LICENSE).
# pylint: disable-msg=E0211,E0213
@@ -1164,6 +1164,19 @@
`IBinaryPackagePublishingHistory`.
"""
+ def retireDeadSourcePublications(distroseries, live_packages, person,
+ rationale=None):
+ """Request deletion of source publications that are no longer alive.
+
+ :param distroseries: `DistroSeries` whose dead publications are
+ to be retired.
+ :param live_packages: Iterable of all source package names (as
+ strings) that are still alive in this distroseries.
+ :param person: The `IPerson` responsible for the removal.
+ :param rationale: Optional text describing why the publications
+ are retired.
+ """
+
def getBuildStatusSummariesForSourceIdsAndArchive(source_ids, archive):
"""Return a summary of the build statuses for source publishing ids.
=== modified file 'lib/lp/soyuz/model/publishing.py'
--- lib/lp/soyuz/model/publishing.py 2011-07-28 18:37:47 +0000
+++ lib/lp/soyuz/model/publishing.py 2011-08-23 14:26:03 +0000
@@ -2016,6 +2016,51 @@
return sources + binary_packages
+ def retireDeadSourcePublications(self, distroseries, live_packages,
+ person, rationale=None):
+ """See `IPublishingSet`."""
+ live_packages = frozenset(live_packages)
+ if len(live_packages) == 0:
+ # Guard against empty set, which SQL's "IN" does not like.
+ live_packages = [None]
+
+ store = IMasterStore(SourcePackagePublishingHistory)
+
+ params = sqlvalues(
+ active_status=active_publishing_status,
+ distroseries=distroseries,
+ live_packages=live_packages,
+ person=person,
+ rationale=rationale,
+ status_deleted=PackagePublishingStatus.DELETED,
+ utc_now=UTC_NOW,
+ )
+
+ # Inserting all package names for an entire distroseries into a
+ # query is a little obscene. But it worked for 200,000 package
+ # names, without complaining, on a 32-bit laptop. That's about
+ # 5 times more source package names than Launchpad currently
+ # knows about. The hardest part was trying to view the postgres
+ # query log in "less," to see that the query really executed.
+ store.execute("""
+ UPDATE SourcePackagePublishingHistory AS SPPH
+ SET
+ status = %(status_deleted)s,
+ datesuperseded = %(utc_now)s,
+ removed_by = %(person)s,
+ removal_comment = %(rationale)s
+ FROM SourcePackageRelease AS SPR
+ WHERE
+ SPR.id = SPPH.sourcepackagerelease AND
+ SPPH.distroseries = %(distroseries)s AND
+ SPPH.status IN %(active_status)s AND
+ SPR.sourcepackagename NOT IN (
+ SELECT id
+ FROM SourcePackageName
+ WHERE name IN %(live_packages)s
+ )
+ """ % params)
+
def getNearestAncestor(
self, package_name, archive, distroseries, pocket=None,
status=None, binary=False):
=== added file 'lib/lp/soyuz/scripts/gina/retire.py'
--- lib/lp/soyuz/scripts/gina/retire.py 1970-01-01 00:00:00 +0000
+++ lib/lp/soyuz/scripts/gina/retire.py 2011-08-23 14:26:03 +0000
@@ -0,0 +1,24 @@
+# Copyright 2011 Canonical Ltd. This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Retirement of packages that are removed upstream."""
+
+__metaclass__ = type
+__all__ = [
+ 'retire_dead_source_publications',
+ ]
+
+from zope.component import getUtility
+
+from lp.app.interfaces.launchpad import ILaunchpadCelebrities
+from lp.registry.interfaces.distribution import IDistributionSet
+from lp.soyuz.interfaces.publishing import IPublishingSet
+
+
+def retire_dead_source_publications(distro_name, series_name, packages_map):
+ """Retire any source packages that are no longer listed."""
+ katie = getUtility(ILaunchpadCelebrities).katie
+ series = getUtility(IDistributionSet)[distro_name].getSeries(series_name)
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ series, packages_map.src_map.iterkeys(), person=katie,
+ rationale="Package no longer seen in upstream sources list.")
=== modified file 'lib/lp/soyuz/scripts/tests/test_gina.py'
--- lib/lp/soyuz/scripts/tests/test_gina.py 2010-08-20 20:31:18 +0000
+++ lib/lp/soyuz/scripts/tests/test_gina.py 2011-08-23 14:26:03 +0000
@@ -1,13 +1,94 @@
-# Copyright 2009-2010 Canonical Ltd. This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
# GNU Affero General Public License version 3 (see the file LICENSE).
from doctest import DocTestSuite
-import unittest
-
+import os.path
+from unittest import TestLoader
+
+import transaction
+
+from canonical.config import config
+from canonical.testing.layers import LaunchpadZopelessLayer
+from lp.soyuz.enums import PackagePublishingStatus
+from lp.soyuz.scripts.gina.archive import PackagesMap
import lp.soyuz.scripts.gina.handlers
+from lp.soyuz.scripts.gina.retire import retire_dead_source_publications
+from lp.testing import TestCaseWithFactory
+from lp.testing.fakemethod import FakeMethod
+
+
+class FakeListing:
+ """Fake, empty, Source or Packages listing."""
+ Step = FakeMethod(result=False)
+
+
+class FakeInfoSet:
+ """Fake packages listing for parsing by `apt_pkg`."""
+
+ def __init__(self, temp_dir, listing_contents):
+ """Set up a fake Sources listing.
+
+ :param temp_dir: A temporary directory to write a Sources file to.
+ :param listing_contents: Contents of the Sources file.
+ """
+ listing_name = self._makeSourcesFile(temp_dir, listing_contents)
+ self.srcfile = file(listing_name, 'r')
+ self.binfile = FakeListing()
+ self.dibinaries = FakeListing()
+
+ def _makeSourcesFile(self, dirname, contents):
+ """Create a Sources listing file for `apt_pkg` to parse.
+
+ Horribly, apt_pkg really wants to read this from a proper file.
+ Can't use StringIO.
+
+ :param dirname: Directory to create a Sources file in.
+ :param contents: Text to write to Sources file.
+ :return: Full path of new Sources file.
+ """
+ listing_name = os.path.join(dirname, 'Sources')
+ listing_output = file(listing_name, 'w')
+ listing_output.write(contents)
+ listing_output.close()
+ return listing_name
+
+ component = 'main'
+ source_only = True
+ arch = 'i386'
+
+
+class FakeArchComponentItems(list):
+ """Simple fake `ArchComponentItems` that `PackagesMap` can parse."""
+ cleanup = FakeMethod()
+
+
+class TestGina(TestCaseWithFactory):
+ """Unit tests for gina."""
+
+ layer = LaunchpadZopelessLayer
+
+ def test_retire_dead_source_publications(self):
+ series = self.factory.makeDistroSeries()
+ dead_spph = self.factory.makeSourcePackagePublishingHistory(series)
+ live_spph = self.factory.makeSourcePackagePublishingHistory(series)
+ packages_map = PackagesMap(FakeArchComponentItems([FakeInfoSet(
+ self.makeTemporaryDirectory(),
+ "Package: %s" % live_spph.sourcepackagerelease.name)]))
+
+ transaction.commit()
+ self.layer.switchDbUser(config.gina.dbuser)
+
+ retire_dead_source_publications(
+ series.distribution.name, series.name, packages_map)
+
+ transaction.commit()
+
+ self.assertEqual(PackagePublishingStatus.DELETED, dead_spph.status)
+ self.assertEqual(PackagePublishingStatus.PENDING, live_spph.status)
def test_suite():
- suite = unittest.TestSuite()
+ suite = TestLoader().loadTestsFromName(__name__)
+ # Run the doctests as well.
suite.addTest(DocTestSuite(lp.soyuz.scripts.gina.handlers))
return suite
=== modified file 'lib/lp/soyuz/tests/test_publishing.py'
--- lib/lp/soyuz/tests/test_publishing.py 2011-08-15 03:41:35 +0000
+++ lib/lp/soyuz/tests/test_publishing.py 2011-08-23 14:26:03 +0000
@@ -3,6 +3,7 @@
"""Test native publication workflow for Soyuz. """
+from collections import defaultdict
import datetime
import operator
import os
@@ -25,6 +26,7 @@
DatabaseFunctionalLayer,
LaunchpadZopelessLayer,
reconnect_stores,
+ ZopelessDatabaseLayer,
)
from lp.app.errors import NotFoundError
from lp.archivepublisher.config import getPubConfig
@@ -48,6 +50,8 @@
from lp.soyuz.interfaces.binarypackagename import IBinaryPackageNameSet
from lp.soyuz.interfaces.component import IComponentSet
from lp.soyuz.interfaces.publishing import (
+ active_publishing_status,
+ inactive_publishing_status,
IPublishingSet,
PackagePublishingPriority,
PackagePublishingStatus,
@@ -1160,6 +1164,100 @@
self.assertEqual(None, record)
+class TestPublishingSetLite(TestCaseWithFactory):
+ """Lightweight unit tests for `PublishingSet`."""
+
+ layer = ZopelessDatabaseLayer
+
+ def mapSPPHStatuses(self, spphs):
+ """Map out `SourcePackagePublishingHistory` by status.
+
+ :param spphs: Iterable of `SourcePackagePublishingHistory`.
+ :return: Dict mapping each status found in spphs to a list of the
+ members of spphs that have that status.
+ """
+ mapping = defaultdict(list)
+ for spph in spphs:
+ mapping[spph.status].append(spph)
+ return dict(mapping)
+
+ def test_retireDeadSourcePublications_retires_dead_package(self):
+ # retireDeadSourcePublication retires publications for packages
+ # that are not mentioned in live_packages.
+ spph = self.factory.makeSourcePackagePublishingHistory()
+ person = self.factory.makePerson()
+ rationale = self.factory.getUniqueString()
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ spph.distroseries, [], person, rationale=rationale)
+ transaction.commit()
+ self.assertEqual(PackagePublishingStatus.DELETED, spph.status)
+ self.assertNotEqual(None, spph.datesuperseded)
+ self.assertEqual(person, spph.removed_by)
+ self.assertEqual(rationale, spph.removal_comment)
+
+ def test_retireDeadSourcePublications_does_not_retire_live_package(self):
+ # retireDeadSourcePublication does not retire publications for
+ # packages that are mentioned in live_packages.
+ spph = self.factory.makeSourcePackagePublishingHistory()
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ spph.distroseries, [spph.sourcepackagerelease.name],
+ self.factory.makePerson())
+ transaction.commit()
+ self.assertEqual(PackagePublishingStatus.PENDING, spph.status)
+ self.assertEqual(None, spph.datesuperseded)
+ self.assertEqual(None, spph.removed_by)
+ self.assertEqual(None, spph.removal_comment)
+
+ def test_retireDeadSourcePublications_ignores_unknown_package(self):
+ # retireDeadSourcePublication does its work regardless of any
+ # live packages that may not occur in the distroseries at all.
+ spph = self.factory.makeSourcePackagePublishingHistory()
+ other_package = self.factory.makeSourcePackageName()
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ spph.distroseries, [other_package.name],
+ self.factory.makePerson())
+ transaction.commit()
+ self.assertEqual(PackagePublishingStatus.DELETED, spph.status)
+
+ def test_retireDeadSourcePublications_ignores_other_distroseries(self):
+ # retireDeadSourcePublication does not retire publications for
+ # other distroseries than the one it's been given.
+ spph = self.factory.makeSourcePackagePublishingHistory()
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ self.factory.makeDistroSeries(), [], self.factory.makePerson())
+ transaction.commit()
+ self.assertEqual(PackagePublishingStatus.PENDING, spph.status)
+
+ def test_retireDeadSourcePublications_ignores_inactive_pubs(self):
+ # retireDeadSourcePublication does not retire publications that
+ # were no longer active anyway.
+ distroseries = self.factory.makeDistroSeries()
+ spphs = [
+ self.factory.makeSourcePackagePublishingHistory(
+ distroseries, status=status)
+ for status in inactive_publishing_status]
+ old_statuses = self.mapSPPHStatuses(spphs)
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ distroseries, [], self.factory.makePerson())
+ transaction.commit()
+ self.assertEqual(old_statuses, self.mapSPPHStatuses(spphs))
+
+ def test_retireDeadSourcePublications_retires_active_pubs(self):
+ # retireDeadSourcePublication retires publications in any active
+ # state.
+ distroseries = self.factory.makeDistroSeries()
+ spphs = [
+ self.factory.makeSourcePackagePublishingHistory(
+ distroseries, status=status)
+ for status in active_publishing_status]
+ getUtility(IPublishingSet).retireDeadSourcePublications(
+ distroseries, [], self.factory.makePerson())
+ transaction.commit()
+ self.assertEqual(
+ {PackagePublishingStatus.DELETED: spphs},
+ self.mapSPPHStatuses(spphs))
+
+
class TestSourceDomination(TestNativePublishingBase):
"""Test SourcePackagePublishingHistory.supersede() operates correctly."""
=== modified file 'scripts/gina.py'
--- scripts/gina.py 2011-08-23 14:26:01 +0000
+++ scripts/gina.py 2011-08-23 14:26:03 +0000
@@ -53,6 +53,7 @@
PoolFileNotFound,
SourcePackageData,
)
+from lp.soyuz.scripts.gina.retire import retire_dead_source_publications
# Set to non-zero if you'd like to be warned every so often
COUNTDOWN = 0
@@ -152,6 +153,9 @@
packages_map, kdb, package_root, keyrings, importer_handler)
importer_handler.commit()
+ retire_dead_source_publications(distro, distroseries, packages_map)
+ importer_handler.commit()
+
if source_only:
log.info('Source only mode... done')
return