← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~cjwatson/launchpad:with-materialized into launchpad:master

 

Colin Watson has proposed merging ~cjwatson/launchpad:with-materialized into launchpad:master.

Commit message:
Prevent folding of some WITH clauses

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/427174

PostgreSQL >= 12 changes the behaviour of `WITH` clauses:

  "By default, a side-effect-free `WITH` query is folded into the primary query if it is used exactly once in the primary query's `FROM` clause."

This will be a problem for Launchpad.  Because we use a query compiler and so have no particular need to use `WITH` clauses specifically for readability, we only use them in two situations: common expressions used multiple times in a query (not affected by this change) and as an optimization fence to instruct the planner to consider particular parts of the query first when we know that they're very likely to be more selective.  As a result, this change always makes things worse for us, and we need to avoid it.

PostgreSQL >= 12 does provide a `MATERIALIZED` keyword that prevents this folding, restoring the older behaviour.  Unfortunately, this is not recognized by PostgreSQL 11 and earlier.  My plan A was to make this keyword available in older versions as a no-op to ease the transition, but my attempt to get that into PostgreSQL (https://www.postgresql.org/message-id/20191018132130.GM16234%40riva.ucam.org) had a distinctly mixed reception and went nowhere.

Plan B was to use `OFFSET 0` in subqueries instead, which is reportedly possible.  However, that doesn't always seem to be sufficient for our uses: for example, in the first case I tried (the `commented_bug_ids` and `commented_bugtask_ids` clauses in `lp.bugs.model.bugtasksearch`, used to search for bugs commented on by a particular person), the `OFFSET 0` approach failed to yield the correct plan.  Looking at the planner code, the only evidence I can see of this making any difference at all is in `EXISTS` expressions.

So, this branch implements plan C: conditionally add the `MATERIALIZED` keyword to relevant queries based on the current PostgreSQL server version.  This requires a slightly awkwardly-written `WithMaterialized` Storm expression that takes a Storm store as an argument and then pokes about in a couple of private attributes to extract the current server version, but those private attributes are in practice quite stable and I don't anticipate this causing a problem.  Once we're using PostgreSQL >= 12 across the board we'll be able to drop the version detection.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:with-materialized into launchpad:master.
diff --git a/lib/lp/blueprints/model/specificationsearch.py b/lib/lp/blueprints/model/specificationsearch.py
index f522ce2..2f00a54 100644
--- a/lib/lp/blueprints/model/specificationsearch.py
+++ b/lib/lp/blueprints/model/specificationsearch.py
@@ -23,7 +23,6 @@ from storm.expr import (
     Or,
     Select,
     Table,
-    With,
 )
 from storm.locals import SQL, Desc
 from zope.component import getUtility
@@ -53,6 +52,7 @@ from lp.services.database.stormexpr import (
     Array,
     ArrayAgg,
     ArrayIntersects,
+    WithMaterialized,
     fti_search,
 )
 from lp.services.propertycache import get_property_cache
@@ -105,8 +105,9 @@ def search_specifications(
     # specifications.
     if clauses:
         RelevantSpecification = Table("RelevantSpecification")
-        relevant_specification_cte = With(
+        relevant_specification_cte = WithMaterialized(
             RelevantSpecification.name,
+            store,
             Select(Specification.id, And(clauses), tables=tables),
         )
         store = store.with_(relevant_specification_cte)
diff --git a/lib/lp/bugs/model/bug.py b/lib/lp/bugs/model/bug.py
index 1e97fb7..49fa22f 100644
--- a/lib/lp/bugs/model/bug.py
+++ b/lib/lp/bugs/model/bug.py
@@ -45,7 +45,6 @@ from storm.expr import (
     Select,
     Sum,
     Union,
-    With,
 )
 from storm.info import ClassAlias
 from storm.locals import Bool, DateTime, Int, Reference, ReferenceSet
@@ -185,6 +184,7 @@ from lp.services.database.sqlobject import (
     StringCol,
 )
 from lp.services.database.stormbase import StormBase
+from lp.services.database.stormexpr import WithMaterialized
 from lp.services.fields import DuplicateBug
 from lp.services.helpers import shortlist
 from lp.services.librarian.interfaces import ILibraryFileAliasSet
@@ -3394,9 +3394,11 @@ class BugMute(StormBase):
 
 
 def generate_subscription_with(bug, person):
+    store = IStore(bug)
     return [
-        With(
+        WithMaterialized(
             "all_bugsubscriptions",
+            store,
             Select(
                 (BugSubscription.id, BugSubscription.person_id),
                 tables=[
@@ -3406,8 +3408,9 @@ def generate_subscription_with(bug, person):
                 where=Or(Bug.id == bug.id, Bug.duplicateofID == bug.id),
             ),
         ),
-        With(
+        WithMaterialized(
             "bugsubscriptions",
+            store,
             Select(
                 SQL("all_bugsubscriptions.id"),
                 tables=[
diff --git a/lib/lp/bugs/model/bugsummary.py b/lib/lp/bugs/model/bugsummary.py
index fcac329..9084e0e 100644
--- a/lib/lp/bugs/model/bugsummary.py
+++ b/lib/lp/bugs/model/bugsummary.py
@@ -10,7 +10,7 @@ __all__ = [
 ]
 
 from storm.base import Storm
-from storm.expr import SQL, And, Or, Select, With
+from storm.expr import SQL, And, Or, Select
 from storm.properties import Bool, Int, Unicode
 from storm.references import Reference
 from zope.interface import implementer
@@ -33,6 +33,8 @@ from lp.registry.model.productseries import ProductSeries
 from lp.registry.model.sourcepackagename import SourcePackageName
 from lp.registry.model.teammembership import TeamParticipation
 from lp.services.database.enumcol import DBEnum
+from lp.services.database.interfaces import IStore
+from lp.services.database.stormexpr import WithMaterialized
 
 
 @implementer(IBugSummary)
@@ -118,17 +120,20 @@ def get_bugsummary_filter_for_user(user):
     elif IPersonRoles(user).in_admin:
         return [], []
     else:
+        store = IStore(TeamParticipation)
         with_clauses = [
-            With(
+            WithMaterialized(
                 "teams",
+                store,
                 Select(
                     TeamParticipation.teamID,
                     tables=[TeamParticipation],
                     where=(TeamParticipation.personID == user.id),
                 ),
             ),
-            With(
+            WithMaterialized(
                 "policies",
+                store,
                 Select(
                     AccessPolicyGrant.policy_id,
                     tables=[AccessPolicyGrant],
diff --git a/lib/lp/bugs/model/bugtasksearch.py b/lib/lp/bugs/model/bugtasksearch.py
index 3bc73e5..2813709 100644
--- a/lib/lp/bugs/model/bugtasksearch.py
+++ b/lib/lp/bugs/model/bugtasksearch.py
@@ -29,7 +29,6 @@ from storm.expr import (
     Row,
     Select,
     Union,
-    With,
 )
 from storm.info import ClassAlias
 from storm.references import Reference
@@ -86,6 +85,7 @@ from lp.services.database.stormexpr import (
     ArrayAgg,
     ArrayIntersects,
     Unnest,
+    WithMaterialized,
     fti_search,
     get_where_for_reference,
     rank_by_fti,
@@ -759,18 +759,21 @@ def _build_query(params):
                 ),
             ),
         )
+        store = IStore(Bug)
         with_clauses.append(
             convert_storm_clause_to_string(
-                With(
+                WithMaterialized(
                     "commented_bug_ids",
+                    store,
                     Union(commented_messages, commented_activities),
                 )
             )
         )
         with_clauses.append(
             convert_storm_clause_to_string(
-                With(
+                WithMaterialized(
                     "commented_bugtask_ids",
+                    store,
                     Select(
                         BugTaskFlat.bugtask_id,
                         tables=[BugTaskFlat],
diff --git a/lib/lp/registry/model/distroseries.py b/lib/lp/registry/model/distroseries.py
index 19e60ec..4b11735 100644
--- a/lib/lp/registry/model/distroseries.py
+++ b/lib/lp/registry/model/distroseries.py
@@ -25,7 +25,6 @@ from storm.expr import (
     Select,
     SQL,
     Table,
-    With,
     )
 from storm.locals import (
     Int,
@@ -107,6 +106,7 @@ from lp.services.database.sqlobject import (
 from lp.services.database.stormexpr import (
     fti_search,
     IsTrue,
+    WithMaterialized,
     )
 from lp.services.librarian.interfaces import ILibraryFileAliasSet
 from lp.services.librarian.model import LibraryFileAlias
@@ -1164,8 +1164,9 @@ class DistroSeries(SQLBase, BugTargetBase, HasSpecificationsMixin,
         # Without this CTE, PostgreSQL sometimes decides to scan the primary
         # key index on PackageUpload instead, which is very slow.
         RelevantUpload = Table("RelevantUpload")
-        relevant_upload_cte = With(
+        relevant_upload_cte = WithMaterialized(
             RelevantUpload.name,
+            IStore(PackageUpload),
             Select(
                 PackageUpload.id,
                 And(
diff --git a/lib/lp/registry/model/person.py b/lib/lp/registry/model/person.py
index eca1632..3c9554e 100644
--- a/lib/lp/registry/model/person.py
+++ b/lib/lp/registry/model/person.py
@@ -268,7 +268,10 @@ from lp.services.database.sqlobject import (
     StringCol,
     )
 from lp.services.database.stormbase import StormBase
-from lp.services.database.stormexpr import fti_search
+from lp.services.database.stormexpr import (
+    fti_search,
+    WithMaterialized,
+    )
 from lp.services.helpers import (
     backslashreplace,
     shortlist,
@@ -1127,12 +1130,13 @@ class Person(
     def isAnyPillarOwner(self):
         """See IPerson."""
 
+        store = IStore(self)
         with_sql = [
             With("teams", SQL("""
                  SELECT team FROM TeamParticipation
                  WHERE TeamParticipation.person = %d
                 """ % self.id)),
-            With("owned_entities", SQL("""
+            WithMaterialized("owned_entities", store, SQL("""
                  SELECT Product.id
                  FROM Product
                  WHERE Product.owner IN (SELECT team FROM teams)
@@ -1146,7 +1150,6 @@ class Person(
                  WHERE Distribution.owner IN (SELECT team FROM teams)
                 """))
            ]
-        store = IStore(self)
         rs = store.with_(with_sql).using("owned_entities").find(
             SQL("count(*) > 0"),
         )
@@ -1462,8 +1465,9 @@ class Person(
         # the selectivity of the filter. Put that in a CTE to force it
         # to calculate the workitems up front, rather than doing a hash
         # join over all of Specification and SpecificationWorkItem.
-        assigned_specificationworkitem = With(
+        assigned_specificationworkitem = WithMaterialized(
             'assigned_specificationworkitem',
+            store,
             Union(
                 Select(
                     SpecificationWorkItem.id,
@@ -1784,15 +1788,17 @@ class Person(
             __storm_table__ = 'RestrictedParticipation'
             teamID = Int(primary=True, name='team')
 
-        restricted_participation_cte = With(
+        store = Store.of(self)
+        restricted_participation_cte = WithMaterialized(
             'RestrictedParticipation',
+            store,
             Select(
                 TeamParticipation.teamID, tables=[TeamParticipation],
                 where=TeamParticipation.person == self))
         team_select = Select(
             RestrictedParticipation.teamID, tables=[RestrictedParticipation])
 
-        return Store.of(self).with_(restricted_participation_cte).find(
+        return store.with_(restricted_participation_cte).find(
             Person,
             Person.id.is_in(
                 Union(
@@ -3996,8 +4002,9 @@ class PersonSet:
         # The conditions we've been given are almost certainly more
         # selective that anything PostgreSQL might guess at.  Use a CTE to
         # ensure that it looks at them first.
-        store = store.with_(With(
+        store = store.with_(WithMaterialized(
             'RelevantPerson',
+            store,
             Select(Person.id, where=conditions, tables=origin)))
         relevant_person = Table('RelevantPerson')
         origin = [
diff --git a/lib/lp/registry/vocabularies.py b/lib/lp/registry/vocabularies.py
index d08fe72..e4d0cb9 100644
--- a/lib/lp/registry/vocabularies.py
+++ b/lib/lp/registry/vocabularies.py
@@ -182,6 +182,7 @@ from lp.services.database.stormexpr import (
     fti_search,
     rank_by_fti,
     RegexpMatch,
+    WithMaterialized,
     )
 from lp.services.features import getFeatureFlag
 from lp.services.helpers import shortlist
@@ -684,7 +685,8 @@ class ValidPersonOrTeamVocabulary(
             # email address.
             # We just select the required ids since we will use
             # IPersonSet.getPrecachedPersonsFromIDs to load the results
-            matching_with = With("MatchingPerson", matching_person_sql)
+            matching_with = WithMaterialized(
+                "MatchingPerson", self.store, matching_person_sql)
             result = self.store.with_(
                 matching_with).using(*tables).find(
                 Person,
diff --git a/lib/lp/services/database/stormexpr.py b/lib/lp/services/database/stormexpr.py
index c40171d..4201b97 100644
--- a/lib/lp/services/database/stormexpr.py
+++ b/lib/lp/services/database/stormexpr.py
@@ -26,6 +26,7 @@ __all__ = [
     'TryAdvisoryLock',
     'Unnest',
     'Values',
+    'WithMaterialized',
     ]
 
 from storm import Undef
@@ -257,12 +258,46 @@ class RegexpMatch(BinaryOper):
     oper = " ~ "
 
 
+compile.set_precedence(compile.get_precedence(Like), RegexpMatch)
+
+
 class JSONExtract(BinaryOper):
     __slots__ = ()
     oper = "->"
 
 
-compile.set_precedence(compile.get_precedence(Like), RegexpMatch)
+class WithMaterialized(Expr):
+    """Compiles to a materialized common table expression.
+
+    On PostgreSQL >= 12, a side-effect-free WITH query may be folded into
+    the primary query if it is used exactly once in the primary query.  This
+    defeats some of our attempts at query optimization.  The MATERIALIZED
+    keyword suppresses this folding, but unfortunately it isn't available in
+    earlier versions of PostgreSQL.  Use it if available.
+
+    Unlike most Storm expressions, this needs access to the store so that it
+    can determine the running PostgreSQL version.
+
+    See:
+     * https://www.postgresql.org/docs/12/sql-select.html#SQL-WITH
+     * https://www.postgresql.org/docs/12/release-12.html#id-1.11.6.16.5.3.4
+    """
+
+    def __init__(self, name, store, select):
+        self.name = name
+        self.store = store
+        self.select = select
+
+
+@compile.when(WithMaterialized)
+def compile_with_materialized(compile, with_expr, state):
+    tokens = []
+    tokens.append(compile(with_expr.name, state, token=True))
+    tokens.append(" AS ")
+    if with_expr.store._database._version >= 120000:
+        tokens.append("MATERIALIZED ")
+    tokens.append(compile(with_expr.select, state))
+    return "".join(tokens)
 
 
 def get_where_for_reference(reference, other):
diff --git a/lib/lp/services/database/tests/test_stormexpr.py b/lib/lp/services/database/tests/test_stormexpr.py
new file mode 100644
index 0000000..4139ad4
--- /dev/null
+++ b/lib/lp/services/database/tests/test_stormexpr.py
@@ -0,0 +1,66 @@
+# Copyright 2022 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test custom Storm expressions."""
+
+from types import SimpleNamespace
+
+from storm.expr import Select
+from zope.component import getUtility
+
+from lp.services.database.interfaces import (
+    IStoreSelector,
+    MAIN_STORE,
+    PRIMARY_FLAVOR,
+    )
+from lp.services.database.sqlbase import convert_storm_clause_to_string
+from lp.services.database.stormexpr import WithMaterialized
+from lp.testing import TestCase
+from lp.testing.layers import (
+    BaseLayer,
+    ZopelessDatabaseLayer,
+    )
+
+
+class TestWithMaterialized(TestCase):
+    """Test how `WithMaterialized` works with different PostgreSQL versions.
+
+    We use a fake store for these tests so that we can control exactly what
+    PostgreSQL version the query compiler sees.
+    """
+
+    layer = BaseLayer
+
+    def makeFakeStore(self, version):
+        # Return a fake `storm.store.Store` that's just enough to satisfy
+        # the Storm query compiler for `WithMaterialized`.
+        return SimpleNamespace(_database=SimpleNamespace(_version=version))
+
+    def test_postgresql_10(self):
+        query = WithMaterialized("test", self.makeFakeStore(100019), Select(1))
+        self.assertEqual(
+            "test AS (SELECT 1)", convert_storm_clause_to_string(query))
+
+    def test_postgresql_12(self):
+        query = WithMaterialized("test", self.makeFakeStore(120011), Select(1))
+        self.assertEqual(
+            "test AS MATERIALIZED (SELECT 1)",
+            convert_storm_clause_to_string(query))
+
+
+class TestWithMaterializedRealDatabase(TestCase):
+    """Test how `WithMaterialized` works with a real database."""
+
+    layer = ZopelessDatabaseLayer
+
+    def test_current_store(self):
+        store = getUtility(IStoreSelector).get(MAIN_STORE, PRIMARY_FLAVOR)
+        query = WithMaterialized("test", store, Select(1))
+        self.assertIn(
+            convert_storm_clause_to_string(query),
+            [
+                # PostgreSQL < 12
+                "test AS (SELECT 1)",
+                # PostgreSQL >= 12
+                "test AS MATERIALIZED (SELECT 1)",
+                ])
diff --git a/lib/lp/translations/model/potmsgset.py b/lib/lp/translations/model/potmsgset.py
index 9cb6702..9231d7f 100644
--- a/lib/lp/translations/model/potmsgset.py
+++ b/lib/lp/translations/model/potmsgset.py
@@ -24,7 +24,6 @@ from storm.expr import (
     Select,
     SQL,
     Table,
-    With,
     )
 from storm.locals import (
     Int,
@@ -52,6 +51,7 @@ from lp.services.database.stormexpr import (
     IsTrue,
     NullsFirst,
     NullsLast,
+    WithMaterialized,
     )
 from lp.services.helpers import shortlist
 from lp.services.propertycache import get_property_cache
@@ -413,7 +413,8 @@ class POTMsgSet(SQLBase):
                 Not(in_use_clause)))
 
         SuggestivePOTemplate = Table('SuggestivePOTemplate')
-        msgsets = With('msgsets', Select(
+        store = IStore(TranslationMessage)
+        msgsets = WithMaterialized('msgsets', store, Select(
             POTMsgSet.id,
             tables=(
                 POTMsgSet,
@@ -440,7 +441,7 @@ class POTMsgSet(SQLBase):
             Coalesce(getattr(TranslationMessage, 'msgstr%d_id' % form), -1)
             for form in range(TranslationConstants.MAX_PLURAL_FORMS)]
 
-        result = IStore(TranslationMessage).with_(msgsets).find(
+        result = store.with_(msgsets).find(
             TranslationMessage,
             TranslationMessage.id.is_in(Select(
                 TranslationMessage.id,
diff --git a/lib/lp/translations/model/translationsperson.py b/lib/lp/translations/model/translationsperson.py
index 3419891..23d139d 100644
--- a/lib/lp/translations/model/translationsperson.py
+++ b/lib/lp/translations/model/translationsperson.py
@@ -14,7 +14,6 @@ from storm.expr import (
     Or,
     Select,
     SQL,
-    With,
     )
 from storm.info import ClassAlias
 from storm.store import Store
@@ -35,6 +34,7 @@ from lp.registry.model.productseries import ProductSeries
 from lp.registry.model.projectgroup import ProjectGroup
 from lp.registry.model.teammembership import TeamParticipation
 from lp.services.database.interfaces import IStore
+from lp.services.database.stormexpr import WithMaterialized
 from lp.services.propertycache import (
     cachedproperty,
     get_property_cache,
@@ -243,14 +243,19 @@ class TranslationsPerson:
             POFile.language != getUtility(ILaunchpadCelebrities).english]
         if no_older_than:
             clause.append(POFileTranslator.date_last_touched >= no_older_than)
-        RecentPOFiles = With("recent_pofiles",
+        store = IStore(POFile)
+        RecentPOFiles = WithMaterialized(
+            "recent_pofiles",
+            store,
             Select(
                 (POFile.id,),
                 tables=[
                     POFileTranslator,
                     Join(POFile, POFileTranslator.pofile == POFile.id)],
                 where=And(*clause)))
-        ReviewableGroups = With("reviewable_groups",
+        ReviewableGroups = WithMaterialized(
+            "reviewable_groups",
+            store,
             Select(
                 (TranslationGroup.id, Translator.language_id),
                 tables=[
@@ -264,7 +269,9 @@ class TranslationsPerson:
                             TeamParticipation.teamID ==
                                 Translator.translator_id,
                             TeamParticipation.personID == self.person.id))]))
-        TranslatableDistroSeries = With("translatable_distroseries",
+        TranslatableDistroSeries = WithMaterialized(
+            "translatable_distroseries",
+            store,
             Select(
                 (DistroSeries.id, SQL('reviewable_groups.language')),
                 tables=[
@@ -281,7 +288,9 @@ class TranslationsPerson:
                         SQL('reviewable_groups'),
                         SQL('reviewable_groups.id') ==
                             Distribution.translationgroupID)]))
-        TranslatableProductSeries = With("translatable_productseries",
+        TranslatableProductSeries = WithMaterialized(
+            "translatable_productseries",
+            store,
             Select(
                 (ProductSeries.id, SQL('reviewable_groups.language')),
                 tables=[

Follow ups