← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wallyworld/launchpad/recipe-daily-build-performance into lp:launchpad

 

Ian Booth has proposed merging lp:~wallyworld/launchpad/recipe-daily-build-performance into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  #711072 RootObject:+daily-builds times out
  https://bugs.launchpad.net/bugs/711072

For more details, see:
https://code.launchpad.net/~wallyworld/launchpad/recipe-daily-build-performance/+merge/51494

The daily recipe builds page https://code.launchpad.net/+daily-builds takes too long to load the required data from the database. One root cause is the current implementation performs a wide join to gather all the data in a single query. There are 9 tables involved in the join - 6 to load the core data and 3 to gather related display data. The need for the 6 table join is driven by the underlying object model - it's not optimised for this type of reporting. There's other issues at play as well, but for now, this branch is meant as a quick win to get a measurable performance improvement pending the resolution of the other issues.

== Implementation ==

Reduce the width of the main query and minimise the columns which need to be in the group by. Perform separate queries to load the required Person, SourcePackageName and Archive records. So we go from having one slow, wide query to 4 (1 main plus 3 eager load) queries. The main query is measurably faster and the 3 eager load queries are fast enough.

One artifact of this new implementation is that the sorting order is changed.
It was: SourcePackageName, Person.name Archive.name
Is is now: Date finished desc, Recipe.name

The original sort order is not a key requirement to maintain. The recipe names should reflect their intent (eg "foobar daily build") so it should be useful to sort on those to allow people to find builds they are interested in.

An explain on the original query had these sorts of figures:

Limit (cost=4814.19..4827.87 rows=76 width=1290) (actual time=5910.631..6155.228 rows=76 loops=1)
GroupAggregate (cost=4814.19..4877.55 rows=352 width=1290) (actual time=5910.629..6155.197 rows=76 loops=1)
Sort (cost=4814.19..4815.07 rows=352 width=1290) (actual time=5908.761..6127.714 rows=4413 loops=1)
Sort Key: sourcepackagename.name, person.name, archive.name,

The new query has:

Limit  (cost=5320.57..5320.82 rows=100 width=106) (actual time=598.613..598.647 rows=100 loops=1)
Sort  (cost=5320.57..5322.13 rows=625 width=106) (actual time=598.611..598.623 rows=100 loops=1)
HashAggregate  (cost=5288.87..5296.68 rows=625 width=106) (actual time=598.084..598.281 rows=295 loops=1)

Each of those metrics seems to show an order of magnitude reduction in "actual time". Hopefully I'm reading the numbers correctly.

This is a quick first step. There's more to come depending on how the data model may need to be changed or whether we move to a search based interface or whether we relax the data "freshness" requirement etc.

== Tests ==

No new tests written.
bin/test -vvt test_recipebuildslisting

== Lint ==

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/code/model/recipebuild.py
-- 
https://code.launchpad.net/~wallyworld/launchpad/recipe-daily-build-performance/+merge/51494
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wallyworld/launchpad/recipe-daily-build-performance into lp:launchpad.
=== modified file 'lib/lp/code/model/recipebuild.py'
--- lib/lp/code/model/recipebuild.py	2011-02-01 00:46:06 +0000
+++ lib/lp/code/model/recipebuild.py	2011-02-28 06:43:04 +0000
@@ -18,9 +18,11 @@
 
 import pytz
 from storm.expr import (
+    Desc,
     Join,
     Max,
-    Select)
+    Select,
+    )
 from storm import Undef
 
 from zope.interface import implements
@@ -83,26 +85,18 @@
         store = ISlaveStore(SourcePackageRecipe)
         tables = [
             SourcePackageRecipe,
-            Join(Person,
-                 Person.id == SourcePackageRecipe.owner_id),
             Join(SourcePackageRecipeBuild,
                  SourcePackageRecipeBuild.recipe_id ==
                  SourcePackageRecipe.id),
             Join(SourcePackageRelease,
                  SourcePackageRecipeBuild.id ==
                  SourcePackageRelease.source_package_recipe_build_id),
-            Join(SourcePackageName,
-                 SourcePackageRelease.sourcepackagename ==
-                 SourcePackageName.id),
             Join(BinaryPackageBuild,
                  BinaryPackageBuild.source_package_release_id ==
                     SourcePackageRelease.id),
             Join(PackageBuild,
                  PackageBuild.id ==
                  BinaryPackageBuild.package_build_id),
-            Join(Archive,
-                 Archive.id ==
-                 SourcePackageRecipe.daily_build_archive_id),
             Join(BuildFarmJob,
                  BuildFarmJob.id ==
                  PackageBuild.build_farm_job_id),
@@ -115,34 +109,63 @@
             where.append(BuildFarmJob.date_finished >= epoch)
 
         result_set = store.using(*tables).find(
-                (SourcePackageName,
-                    Person,
-                    SourcePackageRecipe,
-                    Archive,
+                (SourcePackageRecipe,
+                    SourcePackageRelease.sourcepackagenameID,
                     Max(BuildFarmJob.date_finished),
                     ),
                 *where
             ).group_by(
-                SourcePackageName,
-                Person,
                 SourcePackageRecipe,
-                Archive,
+                SourcePackageRelease.sourcepackagenameID,
             ).order_by(
-                SourcePackageName.name,
-                Person.name,
-                Archive.name,
-                )
+                Desc(Max(BuildFarmJob.date_finished)),
+                SourcePackageRecipe.name,
+            )
 
         def _makeRecipeBuildRecord(values):
-            (sourcepackagename, recipeowner, recipe, archive,
-                date_finished) = values
+            (recipe, sourcepackagename_id, date_finished) = values
+            sp_name = store.get(SourcePackageName, sourcepackagename_id)
             return RecipeBuildRecord(
-                sourcepackagename, recipeowner,
-                archive, recipe,
+                sp_name, recipe.owner,
+                recipe.daily_build_archive, recipe,
                 date_finished)
 
+        to_recipes = lambda rows: [row[0] for row in rows]
+        to_releases = lambda rows: [row[1] for row in rows]
+
+        def eager_load_owners(recipes):
+            owner_ids = set(recipe.owner_id for recipe in recipes)
+            owner_ids.discard(None)
+            if not owner_ids:
+                return
+            list(store.find(Person, Person.id.is_in(owner_ids)))
+
+        def eager_load_archives(recipes):
+            archive_ids = set(
+            recipe.daily_build_archive_id for recipe in recipes)
+            archive_ids.discard(None)
+            if not archive_ids:
+                return
+            list(store.find(Archive, Archive.id.is_in(archive_ids)))
+
+        def eager_load_sourcepackagenames(releases):
+            name_ids = set(release for release in releases)
+            name_ids.discard(None)
+            if not name_ids:
+                return
+            list(store.find(
+                SourcePackageName, SourcePackageName.id.is_in(name_ids)))
+
+        def _prefetchRecipeBuildData(rows):
+            recipes = to_recipes(rows)
+            eager_load_owners(recipes)
+            eager_load_archives(recipes)
+            releases = to_releases(rows)
+            eager_load_sourcepackagenames(releases)
+
         return RecipeBuildRecordResultSet(
-            result_set, _makeRecipeBuildRecord)
+            result_set, _makeRecipeBuildRecord,
+            pre_iter_hook=_prefetchRecipeBuildData)
 
 
 class RecipeBuildRecordResultSet(DecoratedResultSet):