launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #05058
[Merge] lp:~jtv/launchpad/bug-857155 into lp:launchpad
Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-857155 into lp:launchpad.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
Bug #857155 in Launchpad itself: "Gina domination is slow"
https://bugs.launchpad.net/launchpad/+bug/857155
For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-857155/+merge/76715
= Summary =
Gina domination is far too slow.
== Proposed fix ==
The most immediate problem is that it tries to dominate far more packages than it needs to. Since domination can only turn Published publications into non-Published ones, we can skip dominating packages that have no more Published publications than they do live versions.
== Pre-implementation notes ==
William assures me that:
1. Gina's import phase (which runs before domination) guarantees the existence of a Published publication for any live package version.
2. Nothing besides the domination process will turn a Published publication into a non-Published one.
Put together, these mean that at the beginning of domination, any package will have at least as many Published publications as live versions. Domination then whittles the number of Published publications per package down to exactly one per live version. And so there should be no need to dominate a package that has exactly as many published publications as live versions after the import phase.
== Implementation details ==
There's a warning for the unexpected case where there are fewer published Publications than live versions. I didn't go so far as to assert this, since this might conceivably still happen as a result of manual database changes or perhaps weirdness in Debian's Sources files. If that happens, there is also a small risk of packages incorrectly not being dominated (if the disappearance of Published publications is offset exactly by an increase in live versions). Generally this should all set itself right in the next run.
== Tests ==
{{{
./bin/test -vvc lp.soyuz.scripts.tests.test_gina
./bin/test -vvc lp.archivepublisher.tests.test_dominator
}}}
== Demo and Q/A ==
Run “gina.py -v sid” on dogfood. The domination phase, which is quiet but comes after all the import-related log output, should complete in minutes at worst (unless the database is a copy dating back to before transitional domination).
= Launchpad lint =
Checking for conflicts and issues in changed files.
Linting changed files:
lib/lp/archivepublisher/domination.py
lib/lp/soyuz/scripts/gina/dominate.py
lib/lp/archivepublisher/tests/test_dominator.py
--
https://code.launchpad.net/~jtv/launchpad/bug-857155/+merge/76715
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-857155 into lp:launchpad.
=== modified file 'lib/lp/archivepublisher/domination.py'
--- lib/lp/archivepublisher/domination.py 2011-09-16 11:41:14 +0000
+++ lib/lp/archivepublisher/domination.py 2011-09-23 11:27:27 +0000
@@ -512,13 +512,24 @@
flush_database_updates()
def findPublishedSourcePackageNames(self, distroseries, pocket):
- """Find names of currently published source packages."""
+ """Find currently published source packages.
+
+ Returns an iterable of tuples: (name of source package, number of
+ publications in Published state).
+ """
+ # Avoid circular imports.
+ from lp.soyuz.model.publishing import SourcePackagePublishingHistory
+
+ looking_for = (
+ SourcePackageName.name,
+ Count(SourcePackagePublishingHistory.id),
+ )
result = IStore(SourcePackageName).find(
- SourcePackageName.name,
+ looking_for,
join_spph_spr(),
join_spr_spn(),
self._composeActiveSourcePubsCondition(distroseries, pocket))
- return result.config(distinct=True)
+ return result.group_by(SourcePackageName.name)
def findPublishedSPPHs(self, distroseries, pocket, package_name):
"""Find currently published source publications for given package."""
=== modified file 'lib/lp/archivepublisher/tests/test_dominator.py'
--- lib/lp/archivepublisher/tests/test_dominator.py 2011-09-09 09:06:28 +0000
+++ lib/lp/archivepublisher/tests/test_dominator.py 2011-09-23 11:27:27 +0000
@@ -5,9 +5,10 @@
__metaclass__ = type
-import apt_pkg
import datetime
from operator import attrgetter
+
+import apt_pkg
from zope.security.proxy import removeSecurityProxy
from canonical.database.sqlbase import flush_database_updates
@@ -611,7 +612,7 @@
status=PackagePublishingStatus.PUBLISHED)
dominator = self.makeDominator([spph])
self.assertContentEqual(
- [spph.sourcepackagerelease.sourcepackagename.name],
+ [(spph.sourcepackagerelease.sourcepackagename.name, 1)],
dominator.findPublishedSourcePackageNames(
spph.distroseries, spph.pocket))
@@ -626,7 +627,7 @@
published_spph = spphs[PackagePublishingStatus.PUBLISHED]
dominator = self.makeDominator(spphs.values())
self.assertContentEqual(
- [published_spph.sourcepackagerelease.sourcepackagename.name],
+ [(published_spph.sourcepackagerelease.sourcepackagename.name, 1)],
dominator.findPublishedSourcePackageNames(series, pocket))
def test_findPublishedSourcePackageNames_ignores_other_archives(self):
@@ -660,21 +661,33 @@
dominator.findPublishedSourcePackageNames(
spph.distroseries, PackagePublishingPocket.SECURITY))
- def test_findPublishedSourcePackageNames_does_not_return_duplicates(self):
+ def test_findPublishedSourcePackageNames_counts_published_SPPHs(self):
series = self.factory.makeDistroSeries()
pocket = PackagePublishingPocket.RELEASE
- package = self.factory.makeSourcePackageName()
+ spr = self.factory.makeSourcePackageRelease()
spphs = [
self.factory.makeSourcePackagePublishingHistory(
- distroseries=series, archive=series.main_archive,
- pocket=pocket, status=PackagePublishingStatus.PUBLISHED,
- sourcepackagerelease=self.factory.makeSourcePackageRelease(
- sourcepackagename=package))
+ distroseries=series, sourcepackagerelease=spr, pocket=pocket,
+ status=PackagePublishingStatus.PUBLISHED)
for counter in xrange(2)]
dominator = self.makeDominator(spphs)
- self.assertEqual(
- [package.name],
- list(dominator.findPublishedSourcePackageNames(series, pocket)))
+ [(name, publications)] = dominator.findPublishedSourcePackageNames(
+ series, pocket)
+ self.assertEqual(len(spphs), publications)
+
+ def test_findPublishedSourcePackageNames_counts_no_other_state(self):
+ series = self.factory.makeDistroSeries()
+ pocket = PackagePublishingPocket.RELEASE
+ spr = self.factory.makeSourcePackageRelease()
+ spphs = [
+ self.factory.makeSourcePackagePublishingHistory(
+ distroseries=series, sourcepackagerelease=spr, pocket=pocket,
+ status=status)
+ for status in PackagePublishingStatus.items]
+ dominator = self.makeDominator(spphs)
+ [(name, publications)] = dominator.findPublishedSourcePackageNames(
+ series, pocket)
+ self.assertEqual(1, publications)
def test_findPublishedSPPHs_finds_published_SPPH(self):
spph = self.factory.makeSourcePackagePublishingHistory(
=== modified file 'lib/lp/soyuz/scripts/gina/dominate.py'
--- lib/lp/soyuz/scripts/gina/dominate.py 2011-09-19 04:26:46 +0000
+++ lib/lp/soyuz/scripts/gina/dominate.py 2011-09-23 11:27:27 +0000
@@ -23,13 +23,29 @@
# Dominate all packages published in the series. This includes all
# packages listed in the Sources file we imported, but also packages
# that have been recently deleted.
- package_names = dominator.findPublishedSourcePackageNames(series, pocket)
- for package_name in package_names:
+ package_counts = dominator.findPublishedSourcePackageNames(series, pocket)
+ for package_name, pub_count in package_counts:
entries = packages_map.src_map.get(package_name, [])
live_versions = [
entry['Version'] for entry in entries if 'Version' in entry]
- dominator.dominateRemovedSourceVersions(
- series, pocket, package_name, live_versions)
+ # Gina import just ensured that any live version in the Sources
+ # file has a Published publication. So there should be at least
+ # as many Published publications as live versions.
+ if pub_count < len(live_versions):
+ logger.warn(
+ "Package %s has fewer live source publications (%s) than "
+ "live versions (%s). The archive may be broken in some "
+ "way.",
+ package_name, pub_count, len(live_versions))
- txn.commit()
+ # Domination can only turn Published publications into
+ # non-Published ones, and ensures that we end up with one
+ # Published publication per live version. Thus, if there are as
+ # many Published publications as live versions, there is no
+ # domination to do. We skip these as an optimization. Without
+ # it, dominating a single Debian series takes hours.
+ if pub_count != len(live_versions):
+ dominator.dominateRemovedSourceVersions(
+ series, pocket, package_name, live_versions)
+ txn.commit()