launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #02510
[Merge] lp:~jtv/launchpad/bug-181368-view into lp:launchpad
Jeroen T. Vermeulen has proposed merging lp:~jtv/launchpad/bug-181368-view into lp:launchpad.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers): code
Related bugs:
#181368 cron.daily explodes in non-obvious ways if drescher is under load
https://bugs.launchpad.net/bugs/181368
For more details, see:
https://code.launchpad.net/~jtv/launchpad/bug-181368-view/+merge/48609
= Gut BinaryPackageFilePublishing view =
This is an added bonus for the work on bug 181368. The main area for performance improvement to the archive publisher is in parallel runs of apt-ftparchive, but another several minutes are spent querying package publishings. I assume the binary package publishings to be the main time sink since they're generally much more numerous than the source packages, and harder to query.
That querying currently happens through the BinaryPackageFilePublishing view. It's a view on a join of 10 tables. Most uses of this view, shockingly, only care about less than half of those tables. And those are clustered around the most selective parts of the query at that. Getting rid of the view should allow us to issue more efficient queries.
William Grant has been trying to remove this view, together with its source-package sibling. Getting rid of them is not as easy as it looks, since all sorts of polymorphism goes on in that family. For this reason I had to leave one BinaryPackageFilePublishing query in for another day. Someday I hope we can replace the whole construct with something more like a Collection—but this is an optimization branch, not a cleanup.
(Well, I did include a few small cleanups that may help future readers understand what various pieces of code do. I didn't make it perfect, but hope it helps.)
You'll note that I then stripped off what fields I could from the Python representation of BinaryPackageFilePublishing without breaking tests. That may help us get a handle on what exactly the objects do and don't do for us. If it weren't for the source package name field, for instance, we could drop 4 tables out of the join: BinaryPackageRelease, BinaryPackageBuild, SourcePackageRelease, and SourcePackageName. The first of those may be quite costly because it's so central to the join (without it we could join BinaryPackageFile straight to BinaryPackagePublishingHistory on their binarypackagerelease foreign keys). The other three form a long chain of 1:1 foreign keys, which it would also be good not to have.
To test, best cast a wide net:
{{{
./bin/test -vvc lp.soyuz
./bin/test -vvc lp.archivepublisher
./bin/test -vvc lp.registry -t doc/distribution.txt
}}}
I didn't actually touch the tests since no functionality should be affected. The affected methods are tested both directly and indirectly by the existing tests.
For Q/A, we'll have to verify that publishing still works and produces the expected files.
I cleaned up lint where I found it, but left one tiny bit intact:
{{{
./lib/lp/registry/model/distribution.py
476: E202 whitespace before ')'
}}}
Some creative formatting there that I couldn't bring myself to disturb.
Jeroen
--
https://code.launchpad.net/~jtv/launchpad/bug-181368-view/+merge/48609
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jtv/launchpad/bug-181368-view into lp:launchpad.
=== added directory 'lib/lp/archivepublisher/model'
=== added file 'lib/lp/archivepublisher/model/__init__.py'
=== renamed file 'lib/lp/archivepublisher/ftparchive.py' => 'lib/lp/archivepublisher/model/ftparchive.py'
--- lib/lp/archivepublisher/ftparchive.py 2011-02-01 16:14:06 +0000
+++ lib/lp/archivepublisher/model/ftparchive.py 2011-02-04 13:25:47 +0000
@@ -1,4 +1,4 @@
-# Copyright 2009 Canonical Ltd. This software is licensed under the
+# Copyright 2009-2011 Canonical Ltd. This software is licensed under the
# GNU Affero General Public License version 3 (see the file LICENSE).
import os
@@ -14,6 +14,7 @@
from canonical.launchpad.components.decoratedresultset import (
DecoratedResultSet,
)
+from canonical.launchpad.database.librarian import LibraryFileAlias
from canonical.launchpad.webapp.interfaces import (
DEFAULT_FLAVOR,
IStoreSelector,
@@ -27,9 +28,16 @@
OutputLineHandler,
ReturnCodeReceiver,
)
+from lp.services.database.stormexpr import Concatenate
from lp.soyuz.enums import PackagePublishingStatus
+from lp.soyuz.model.binarypackagebuild import BinaryPackageBuild
+from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
from lp.soyuz.model.component import Component
+from lp.soyuz.model.distroarchseries import DistroArchSeries
+from lp.soyuz.model.files import BinaryPackageFile
+from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
from lp.soyuz.model.section import Section
+from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
def package_name(filename):
@@ -235,7 +243,6 @@
"""
# Avoid cicular imports.
from lp.soyuz.model.publishing import SourcePackagePublishingHistory
- from lp.soyuz.model.sourcepackagerelease import SourcePackageRelease
store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
origins = (
@@ -284,8 +291,6 @@
"""
# Avoid cicular imports.
from lp.soyuz.model.binarypackagename import BinaryPackageName
- from lp.soyuz.model.binarypackagerelease import BinaryPackageRelease
- from lp.soyuz.model.publishing import BinaryPackagePublishingHistory
store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
origins = (
@@ -565,31 +570,47 @@
:return: a `DecoratedResultSet` with the binary files information
tuples.
"""
- # Avoid circular imports.
- from lp.soyuz.model.publishing import BinaryPackageFilePublishing
-
store = getUtility(IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
- result_set = store.using(BinaryPackageFilePublishing).find(
- (BinaryPackageFilePublishing.sourcepackagename,
- BinaryPackageFilePublishing.libraryfilealiasfilename,
- BinaryPackageFilePublishing.componentname,
- BinaryPackageFilePublishing.architecturetag),
- BinaryPackageFilePublishing.distribution == self.distro,
- BinaryPackageFilePublishing.archive == self.publisher.archive,
- BinaryPackageFilePublishing.distroseriesname == distroseries.name,
- BinaryPackageFilePublishing.pocket == pocket,
- BinaryPackageFilePublishing.publishingstatus ==
- PackagePublishingStatus.PUBLISHED)
+
+ columns = (
+ SourcePackageName.name,
+ LibraryFileAlias.filename,
+ Component.name,
+ Concatenate("binary-", DistroArchSeries.architecturetag),
+ )
+ join_conditions = [
+ BinaryPackageRelease.id ==
+ BinaryPackagePublishingHistory.binarypackagereleaseID,
+ BinaryPackageFile.binarypackagereleaseID ==
+ BinaryPackagePublishingHistory.binarypackagereleaseID,
+ BinaryPackageBuild.id == BinaryPackageRelease.buildID,
+ SourcePackageRelease.id ==
+ BinaryPackageBuild.source_package_release_id,
+ SourcePackageName.id == SourcePackageRelease.sourcepackagenameID,
+ LibraryFileAlias.id == BinaryPackageFile.libraryfileID,
+ DistroArchSeries.id ==
+ BinaryPackagePublishingHistory.distroarchseriesID,
+ Component.id == BinaryPackagePublishingHistory.componentID,
+ ]
+ select_conditions = [
+ BinaryPackagePublishingHistory.dateremoved == None,
+ DistroArchSeries.distroseriesID == distroseries.id,
+ BinaryPackagePublishingHistory.archive == self.publisher.archive,
+ BinaryPackagePublishingHistory.pocket == pocket,
+ BinaryPackagePublishingHistory.status ==
+ PackagePublishingStatus.PUBLISHED,
+ ]
suite = distroseries.getSuite(pocket)
def add_suite(result):
- name, filename, component, architecturetag = result
- architecture = 'binary-' + architecturetag
+ name, filename, component, architecture = result
return (name, suite, filename, component, architecture)
+ result_set = store.find(
+ columns, *(join_conditions + select_conditions))
result_set.order_by(
- Desc(BinaryPackageFilePublishing.id))
+ BinaryPackagePublishingHistory.id, BinaryPackageFile.id)
return DecoratedResultSet(result_set, add_suite)
def generateFileLists(self, fullpublish=False):
@@ -621,12 +642,10 @@
component, sourcepackagename, filename)
this_file = filelist.setdefault(suite, {})
this_file.setdefault(component, {})
- if architecturetag:
- this_file[component].setdefault(architecturetag, [])
- this_file[component][architecturetag].append(ondiskname)
- else:
- this_file[component].setdefault('source', [])
- this_file[component]['source'].append(ondiskname)
+ if architecturetag is None:
+ architecturetag = "source"
+ this_file[component].setdefault(architecturetag, [])
+ this_file[component][architecturetag].append(ondiskname)
# Process huge iterations (more than 200K records) in batches.
# See `PublishingTunableLoop`.
@@ -644,20 +663,23 @@
process_in_batches(
binaryfiles, update_binary_filelist, self.log)
- for suite, components in filelist.items():
+ for suite, components in filelist.iteritems():
self.log.debug("Writing file lists for %s" % suite)
- for component, architectures in components.items():
- for architecture, file_names in architectures.items():
+ series, pocket = self.distro.getDistroSeriesAndPocket(suite)
+ for component, architectures in components.iteritems():
+ for architecture, file_names in architectures.iteritems():
# XXX wgrant 2010-10-06: There must be a better place to
# do this.
- series, pocket = (
- self.distro.getDistroSeriesAndPocket(suite))
- if (architecture != 'source' and
- not series.getDistroArchSeries(
- architecture[7:]).enabled):
- continue
- self.writeFileList(architecture, file_names,
- suite, component)
+ if architecture == "source":
+ enabled = True
+ else:
+ # The "[7:]" strips the "binary-" prefix off the
+ # architecture names we get here.
+ das = series.getDistroArchSeries(architecture[7:])
+ enabled = das.enabled
+ if enabled:
+ self.writeFileList(
+ architecture, file_names, suite, component)
def writeFileList(self, arch, file_names, dr_pocketed, component):
"""Outputs a file list for a series and architecture.
=== modified file 'lib/lp/archivepublisher/publishing.py'
--- lib/lp/archivepublisher/publishing.py 2010-12-19 03:48:38 +0000
+++ lib/lp/archivepublisher/publishing.py 2011-02-04 13:25:47 +0000
@@ -25,7 +25,7 @@
)
from lp.archivepublisher.diskpool import DiskPool
from lp.archivepublisher.domination import Dominator
-from lp.archivepublisher.ftparchive import FTPArchiveHandler
+from lp.archivepublisher.model.ftparchive import FTPArchiveHandler
from lp.archivepublisher.htaccess import (
htpasswd_credentials_for_archive,
write_htaccess,
=== modified file 'lib/lp/archivepublisher/tests/test_ftparchive.py'
--- lib/lp/archivepublisher/tests/test_ftparchive.py 2010-12-20 07:51:34 +0000
+++ lib/lp/archivepublisher/tests/test_ftparchive.py 2011-02-04 13:25:47 +0000
@@ -19,7 +19,7 @@
from canonical.testing.layers import LaunchpadZopelessLayer
from lp.archivepublisher.config import getPubConfig
from lp.archivepublisher.diskpool import DiskPool
-from lp.archivepublisher.ftparchive import (
+from lp.archivepublisher.model.ftparchive import (
f_touch,
FTPArchiveHandler,
)
=== modified file 'lib/lp/archivepublisher/utils.py'
--- lib/lp/archivepublisher/utils.py 2010-08-25 11:01:59 +0000
+++ lib/lp/archivepublisher/utils.py 2011-02-04 13:25:47 +0000
@@ -74,13 +74,11 @@
# 1 StupidCache + clear_current_connection_caches() [this];
# 2 storm.Cache + clear_current_connection_caches() [no difference];
# 3 StupidCache + store.invalidate(obj) [references left behind];
-# 4 stormCache + store.invlaidate(obj) [references left behind];
+# 4 stormCache + store.invalidate(obj) [references left behind];
# 5 No batches [memory exhausted].
-# XXX cprov 20080630: If we decide to keep this code/functionality, which
-# I think we should, independently of the need to cleanup the cache after
-# processing each batch, we should generalize and test it as suggested in
-# bug #244328.
+# XXX JeroenVermeulen 2011-02-03 bug=244328: That was mid-2008. We have
+# the GenerationalCache now. We may not need any of this any more.
class PublishingTunableLoop(object):
"""An `ITunableLoop` for dealing with huge publishing result sets."""
=== modified file 'lib/lp/registry/model/distribution.py'
--- lib/lp/registry/model/distribution.py 2011-01-20 21:06:23 +0000
+++ lib/lp/registry/model/distribution.py 2011-02-04 13:25:47 +0000
@@ -52,6 +52,7 @@
Match,
RANK,
)
+from canonical.launchpad.database.librarian import LibraryFileAlias
from canonical.launchpad.helpers import (
ensure_unicode,
shortlist,
@@ -169,6 +170,7 @@
PackagePublishingStatus,
PackageUploadStatus,
)
+from lp.soyuz.model.files import BinaryPackageFile
from lp.soyuz.interfaces.archive import (
IArchiveSet,
MAIN_ARCHIVE_PURPOSES,
@@ -193,7 +195,6 @@
)
from lp.soyuz.model.distroseriespackagecache import DistroSeriesPackageCache
from lp.soyuz.model.publishing import (
- BinaryPackageFilePublishing,
BinaryPackagePublishingHistory,
SourcePackageFilePublishing,
SourcePackagePublishingHistory,
@@ -481,7 +482,7 @@
source_mirrors = list(Store.of(self).find(
(MirrorDistroSeriesSource.distribution_mirrorID,
- Max(MirrorDistroSeriesSource.freshness)),
+ Max(MirrorDistroSeriesSource.freshness)),
MirrorDistroSeriesSource.distribution_mirrorID.is_in(
[mirror.id for mirror in mirrors])).group_by(
MirrorDistroSeriesSource.distribution_mirrorID))
@@ -492,8 +493,10 @@
for mirror in mirrors:
cache = get_property_cache(mirror)
- cache.arch_mirror_freshness = arch_mirror_freshness.get(mirror.id, None)
- cache.source_mirror_freshness = source_mirror_freshness.get(mirror.id, None)
+ cache.arch_mirror_freshness = arch_mirror_freshness.get(
+ mirror.id, None)
+ cache.source_mirror_freshness = source_mirror_freshness.get(
+ mirror.id, None)
return mirrors
@property
@@ -982,6 +985,25 @@
DistroSeries.distribution == self,
DistroSeries.status == status)
+ def _findPublishedBinaryFile(self, filename, archive):
+ """Find file of the given name for the given distro and archive.
+
+ :return: A `LibraryFileAlias`, or None.
+ """
+ search = Store.of(self).find(
+ LibraryFileAlias,
+ BinaryPackageFile.libraryfileID == LibraryFileAlias.id,
+ BinaryPackagePublishingHistory.binarypackagereleaseID ==
+ BinaryPackageFile.binarypackagereleaseID,
+ DistroArchSeries.id ==
+ BinaryPackagePublishingHistory.distroarchseriesID,
+ DistroSeries.id == DistroArchSeries.distroseriesID,
+ BinaryPackagePublishingHistory.dateremoved == None,
+ DistroSeries.distribution == self,
+ BinaryPackagePublishingHistory.archiveID == archive.id,
+ LibraryFileAlias.filename == filename)
+ return search.order_by(Desc(LibraryFileAlias.id)).first()
+
def getFileByName(self, filename, archive=None, source=True, binary=True):
"""See `IDistribution`."""
assert (source or binary), "searching in an explicitly empty " \
@@ -989,21 +1011,22 @@
if archive is None:
archive = self.main_archive
- if source:
- candidate = SourcePackageFilePublishing.selectFirstBy(
+ if binary:
+ candidate = self._findPublishedBinaryFile(filename, archive)
+ else:
+ candidate = None
+
+ if source and candidate is None:
+ spfp = SourcePackageFilePublishing.selectFirstBy(
distribution=self, libraryfilealiasfilename=filename,
archive=archive, orderBy=['id'])
-
- if binary:
- candidate = BinaryPackageFilePublishing.selectFirstBy(
- distribution=self,
- libraryfilealiasfilename=filename,
- archive=archive, orderBy=["-id"])
-
- if candidate is not None:
- return candidate.libraryfilealias
-
- raise NotFoundError(filename)
+ if spfp is not None:
+ candidate = spfp.libraryfilealias
+
+ if candidate is None:
+ raise NotFoundError(filename)
+ else:
+ return candidate
def getBuildRecords(self, build_state=None, name=None, pocket=None,
arch_tag=None, user=None, binary_only=True):
=== modified file 'lib/lp/services/database/stormexpr.py'
--- lib/lp/services/database/stormexpr.py 2011-02-01 01:01:11 +0000
+++ lib/lp/services/database/stormexpr.py 2011-02-04 13:25:47 +0000
@@ -3,11 +3,13 @@
__metaclass__ = type
__all__ = [
+ 'Concatenate',
'CountDistinct',
'Greatest',
]
from storm.expr import (
+ BinaryOper,
compile,
EXPR,
Expr,
@@ -15,17 +17,18 @@
)
-# XXX wallyworld 2011-01-31 bug=710466:
-# We need to use a Postgres greatest() function call but Storm doesn't
-# support that yet.
class Greatest(NamedFunc):
+ # XXX wallyworld 2011-01-31 bug=710466:
+ # We need to use a Postgres greatest() function call but Storm
+ # doesn't support that yet.
__slots__ = ()
name = "GREATEST"
-# XXX: wallyworld 2010-11-26 bug=675377:
-# storm's Count() implementation is broken for distinct with > 1 column
class CountDistinct(Expr):
+ # XXX: wallyworld 2010-11-26 bug=675377:
+ # storm's Count() implementation is broken for distinct with > 1
+ # column.
__slots__ = ("columns")
@@ -39,3 +42,9 @@
col = compile(countselect.columns)
state.pop()
return "count(distinct(%s))" % col
+
+
+class Concatenate(BinaryOper):
+ """Storm operator for string concatenation."""
+ __slots__ = ()
+ oper = " || "
=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
--- lib/lp/soyuz/interfaces/publishing.py 2011-01-31 16:01:21 +0000
+++ lib/lp/soyuz/interfaces/publishing.py 2011-02-04 13:25:47 +0000
@@ -655,10 +655,6 @@
title=_('Binary Package publishing record id'), required=True,
readonly=True,
)
- architecturetag = TextLine(
- title=_("Architecture tag. As per dpkg's use"), required=True,
- readonly=True,
- )
class IBinaryPackagePublishingHistoryPublic(IPublishingView):
=== modified file 'lib/lp/soyuz/model/publishing.py'
--- lib/lp/soyuz/model/publishing.py 2011-01-25 08:00:44 +0000
+++ lib/lp/soyuz/model/publishing.py 2011-02-04 13:25:47 +0000
@@ -243,11 +243,6 @@
implements(IBinaryPackageFilePublishing)
- distribution = ForeignKey(dbName='distribution',
- foreignKey="Distribution",
- unique=False, notNull=True,
- immutable=True)
-
binarypackagepublishing = ForeignKey(
dbName='binarypackagepublishing',
foreignKey='BinaryPackagePublishingHistory', immutable=True)
@@ -266,19 +261,6 @@
sourcepackagename = StringCol(dbName='sourcepackagename', unique=False,
notNull=True, immutable=True)
- distroseriesname = StringCol(dbName='distroseriesname', unique=False,
- notNull=True, immutable=True)
-
- publishingstatus = EnumCol(dbName='publishingstatus', unique=False,
- notNull=True, immutable=True,
- schema=PackagePublishingStatus)
-
- architecturetag = StringCol(dbName='architecturetag', unique=False,
- notNull=True, immutable=True)
-
- pocket = EnumCol(dbName='pocket', unique=False,
- notNull=True, schema=PackagePublishingPocket)
-
archive = ForeignKey(dbName="archive", foreignKey="Archive", notNull=True)
@property