← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~wallyworld/launchpad/sharing-details-timeout2-997343 into lp:launchpad

 

Ian Booth has proposed merging lp:~wallyworld/launchpad/sharing-details-timeout2-997343 into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #997343 in Launchpad itself: "Timeout trying to see sharing details page for ~launchpad-security"
  https://bugs.launchpad.net/launchpad/+bug/997343

For more details, see:
https://code.launchpad.net/~wallyworld/launchpad/sharing-details-timeout2-997343/+merge/106122

== Implementation ==

Optimise bugtask searching where the search involves matching a reference to many values.

A previous branch optimised the queries used for the +sharingdetails page so that everything is bilk loaded and the query count is fixed regardless of the number of shared artifacts. However, the page was still sluggish. It turned out Postgres prefers 'in' rather than 'or' for large lists.

Consider a bug task search like the following:

param = BugTaskSearchParams(user=user, bug=any(*bug_ids))
bugtasks = list(getUtility(IBugTaskSet).search(param))

This generates SQL like the following:

select * from bugtask where bug=1 or bug=2 or bug=3 or bug=4 or ...

For a large number of bug_ids, this is not very optimal. An 'in' construct would be much better:

select * from bugtask where bug in (1, 2, 3, 4 ...)

The above works for foreign key references with simple primary keys. Composite keys still require the 'or' construct.

The search_value_to_storm_where_condition() method constructs the various expressions required to piece together the 'where' clause of a bugtask search. This is where the 'or' was being constructed, using a some Storm internals since Storm references do not support what is required.

I started out fixing this deficiency in storm itself (well, our fork of storm). However, it got messy because the API used is not exposed public by Storm and there are some corner cases which would need to be supported for a totally generic solution. So I grabbed the current lp code from bugtask search code and expanded on it and moved to to Launchpad's collection of Storm expression extensions in lp.service.database.stormexpr.py

A new helper method is provided: get_where_for_reference(reference, other)
'other' may be a single object or a collection.

== Tests ==

Nothing in stormexpr.py is unit tested. However, the bugtask search tests and other tests which use bugtask search all work as expected and verify that the small rework to the search query works as expected.

== Lint ==

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/bugs/model/bugtasksearch.py
  lib/lp/services/database/stormexpr.py
-- 
https://code.launchpad.net/~wallyworld/launchpad/sharing-details-timeout2-997343/+merge/106122
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~wallyworld/launchpad/sharing-details-timeout2-997343 into lp:launchpad.
=== modified file 'lib/lp/bugs/model/bugtasksearch.py'
--- lib/lp/bugs/model/bugtasksearch.py	2012-05-14 23:59:33 +0000
+++ lib/lp/bugs/model/bugtasksearch.py	2012-05-17 08:42:20 +0000
@@ -78,6 +78,7 @@
 from lp.services.database.sqlbase import sqlvalues
 from lp.services.database.stormexpr import (
     Array,
+    get_where_for_reference,
     NullCount,
     )
 from lp.services.propertycache import get_property_cache
@@ -209,10 +210,7 @@
         if not search_value.query_values:
             return None
         if isinstance(comp, Reference):
-            # References don't have an is_in.
-            return Or(*[
-                comp._relation.get_where_for_local(value)
-                for value in search_value.query_values])
+            return get_where_for_reference(comp, search_value.query_values)
         else:
             return comp.is_in(search_value.query_values)
     elif zope_isinstance(search_value, not_equals):

=== modified file 'lib/lp/services/database/stormexpr.py'
--- lib/lp/services/database/stormexpr.py	2012-05-03 11:58:01 +0000
+++ lib/lp/services/database/stormexpr.py	2012-05-17 08:42:20 +0000
@@ -11,10 +11,12 @@
     'Concatenate',
     'CountDistinct',
     'Greatest',
+    'get_where_for_reference',
     'NullCount',
     'TryAdvisoryLock',
     ]
 
+from storm.exceptions import ClassInfoError
 from storm.expr import (
     BinaryOper,
     ComparableExpr,
@@ -22,8 +24,11 @@
     compile,
     EXPR,
     Expr,
+    In,
     NamedFunc,
+    Or,
     )
+from storm.info import get_obj_info
 
 
 class Greatest(NamedFunc):
@@ -109,3 +114,55 @@
     "True iff the left side shares at least one element with the right side."
     __slots__ = ()
     oper = "&&"
+
+
+def get_where_for_reference(reference, other):
+    """Generate a column comparison expression for a reference property.
+
+    The returned expression may be used to find referenced objects referring
+    to C{other}.
+
+    If the right hand side is a collection of values, then an OR in IN
+    expression is returned - if the relation uses composite keys, then an OR
+    expression is used; single key references produce an IN expression which is
+    more efficient for large collections of values.
+    """
+    relation = reference._relation
+    if isinstance(other, (list, set, tuple,)):
+        return _get_where_for_local_many(relation, other)
+    else:
+        return relation.get_where_for_local(other)
+
+
+def _remote_variables(relation, obj):
+    """A helper function to extract the foreign key values of an object.
+    """
+    try:
+        get_obj_info(obj)
+    except ClassInfoError:
+        if type(obj) is not tuple:
+            remote_variables = (obj,)
+        else:
+            remote_variables = obj
+    else:
+        # Don't use other here, as it might be
+        # security proxied or something.
+        obj = get_obj_info(obj).get_obj()
+        remote_variables = relation.get_remote_variables(obj)
+    return remote_variables
+
+
+def _get_where_for_local_many(relation, others):
+    """Generate an OR or IN expression used to find others.
+
+    If the relation uses composite keys, then an OR expression is used;
+    single key references produce an IN expression which is more efficient for
+    large collections of values.
+    """
+
+    if len(relation.local_key) == 1:
+        return In(
+            relation.local_key[0],
+            [_remote_variables(relation, value) for value in others])
+    else:
+        return Or(*[relation.get_where_for_local(value) for value in others])


Follow ups