launchpad-reviewers team mailing list archive

Thread
Date

[Merge] lp:~gary/launchpad/bug731099 into lp:launchpad

To: mp+59098@xxxxxxxxxxxxxxxxxx
From: Gary Poster <gary.poster@xxxxxxxxxxxxx>
Date: Tue, 26 Apr 2011 16:19:28 -0000
Reply-to: mp+59098@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Gary Poster has proposed merging lp:~gary/launchpad/bug731099 into lp:launchpad.

Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
Bug #731099 in Launchpad itself: "/bugs/230350/+bug-portlet-subscribers-content has complex queries - structural subscriptions have poor queries on bugs with many bugtasks"
https://bugs.launchpad.net/launchpad/+bug/731099

For more details, see:
https://code.launchpad.net/~gary/launchpad/bug731099/+merge/59098

This branch addresses bug 731099, which is about the length and readability of the generated SQL in an edge case of many, many bugtasks for a single bug, many of which share the same combination of status and importance. It is tackled now because it is triaged high on the list of bugs that we must close to finish our feature.

I approached this problem by making two changes. First, I aggregated conditions that shared statuses and importances. As would be expected, this will shorten the SQL only when multiple bugtasks are involved that share the same status and importance. Second, I shifted a set of conditions that were duplicated when a bug had tags into a separate query. For the cost of adding a separate query, this improves the total length of the SQL generated for any bug with tags, and gradually approaches halving the generated SQL as the number of bugtasks increases.

Performance-wise, for a bug with tags and a single bugtask, this change is irrelevant: in my tests on my development machine, times went from a single 4ms query to two queries of 1ms and 3ms, respectively. If the bug has tags and several bugtasks (I tested with seven), the speed might have improved slightly, though my small hand-run sample size can only hint at a possible improvement (7ms vs 3ms + 3ms). Performance may or may not be significantly better for bugs with tens of bugtasks like bug 230350, which triggered the bug report this branch addresses; that said, the query in production now is not performing atrociously for being an edge case (about 400ms) and is also not the worst offender on the page.

(NB: Developers who are previewing the current malone features via feature flags should know that the original reported problem is completely hidden from them; https://bugs.launchpad.net/bugs/230350/+bug-portlet-subscribers-content is quite snappy because it no longer renders also-notified subscribers for them.)

This branch does not change behavior materially--results should be identical, and I actually increased the number of SQL queries made by one in order to reduce the total amount of SQL generated--and so no tests have been added or changed. I examined the output of https://bugs.launchpad.dev/++profile++show/bugs/4/+bug-portlet-subscribers-content to evaluate my changes.

As such, I'll summarize the results of my experiments here, and provide some links to pastebins for more information.

I examined the case of a bug with tags, because this generates the most SQL, and matches the bug description. I looked at the case of a bug with one bugtask, and at the case of a bug with several bugtasks (7).

For the case of one bugtask, my branch reduces SQL somewhat by virtue of breaking out the duplicated conditions into a separate query. In contrast, as would be expected, the change to aggregate targets by statuses and importances is irrelevant. Total character count went from 3280 to 2735--the new is about 83% the size of the old. See http://pastebin.ubuntu.com/599370/ .

For the case of seven bugtasks with shared statuses/importances, both changes come into play and we get a bigger win. Total character count moves from 6794 to 3407--the new is half the size. You can see the changes--first focusing on just the status/importance/target change and then the full queries--here: http://pastebin.ubuntu.com/599372/ . You might note that there is still some duplication for distributions; this comes from the structural subscription target helper joins, which I really would hate to lose as an abstraction, so I'm hoping it is acceptable.

Thank you!

Gary
--
https://code.launchpad.net/~gary/launchpad/bug731099/+merge/59098
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~gary/launchpad/bug731099 into lp:launchpad.

=== modified file 'lib/lp/bugs/model/structuralsubscription.py'
--- lib/lp/bugs/model/structuralsubscription.py	2011-04-11 21:31:04 +0000
+++ lib/lp/bugs/model/structuralsubscription.py	2011-04-26 16:19:25 +0000
@@ -11,6 +11,8 @@
     'StructuralSubscriptionTargetMixin',
     ]
 
+from collections import defaultdict
+
 import pytz
 from storm.base import Storm
 from storm.expr import (
@@ -602,48 +604,31 @@
         bugtasks = [bug_or_bugtask]
     else:
         raise ValueError('First argument must be bug or bugtask')
-    # XXX gary 2011-03-03 bug 728818
-    # Once we no longer have structural subscriptions without filters in
-    # qastaging and production, _get_structural_subscription_filter_id_query
-    # can just return the query, and leave the candidates out of it.
-    candidates, filter_id_query = (
+    filter_id_query = (
         _get_structural_subscription_filter_id_query(
             bug, bugtasks, level, direct_subscribers))
-    if not candidates:
+    if not filter_id_query:
         return EmptyResultSet()
     # This is here because of a circular import.
     from lp.registry.model.person import Person
     source = IStore(StructuralSubscription).using(
         StructuralSubscription,
-        # XXX gary 2011-03-03 bug 728818
-        # We need to do this LeftJoin because we still have structural
-        # subscriptions without filters in qastaging and production.
-        # Once we do not, we can just use a Join.  Also see constraints
-        # below.
-        LeftJoin(BugSubscriptionFilter,
+        Join(BugSubscriptionFilter,
              BugSubscriptionFilter.structural_subscription_id ==
              StructuralSubscription.id),
         Join(Person,
              Person.id == StructuralSubscription.subscriberID),
         )
-    constraints = [
-        # XXX gary 2011-03-03 bug 728818
-        # We need to do this Or because we still have structural
-        # subscriptions without filters in qastaging and production.
-        # Once we do not, we can simplify this to just
-        # "In(BugSubscriptionFilter.id, filter_id_query)".  Also see
-        # LeftJoin above.
-        Or(In(BugSubscriptionFilter.id, filter_id_query),
-           And(In(StructuralSubscription.id, candidates),
-               BugSubscriptionFilter.id == None))]
     if recipients is None:
         return source.find(
-            Person, *constraints).config(distinct=True).order_by()
+            Person,
+            In(BugSubscriptionFilter.id,
+               filter_id_query)).config(distinct=True).order_by()
     else:
         subscribers = []
         query_results = source.find(
             (Person, StructuralSubscription, BugSubscriptionFilter),
-            *constraints)
+            In(BugSubscriptionFilter.id, filter_id_query))
         for person, subscription, filter in query_results:
             # Set up results.
             if person not in recipients:
@@ -726,22 +711,7 @@
         # If there are no structural subscriptions for these targets,
         # then we don't need to look at the importance, status, and
         # tags.  We're done.
-        return None, None
-    # These are tables and joins we will want.
-    tables = [
-        StructuralSubscription,
-        Join(BugSubscriptionFilter,
-             BugSubscriptionFilter.structural_subscription_id ==
-             StructuralSubscription.id),
-        LeftJoin(BugSubscriptionFilterStatus,
-                 BugSubscriptionFilterStatus.filter_id ==
-                 BugSubscriptionFilter.id),
-        LeftJoin(BugSubscriptionFilterImportance,
-                 BugSubscriptionFilterImportance.filter_id ==
-                 BugSubscriptionFilter.id),
-        LeftJoin(BugSubscriptionFilterTag,
-                 BugSubscriptionFilterTag.filter_id ==
-                 BugSubscriptionFilter.id)]
+        return None
     # The "conditions" list will eventually be passed to a Storm
     # "And" function, and then become the WHERE clause of our SELECT.
     conditions = [In(StructuralSubscription.id, candidates)]
@@ -756,12 +726,7 @@
     # Note that casting bug.tags to a list subtly removes the security
     # proxy on the list.  Strings are never security-proxied, so we
     # don't have to worry about them.
-    query = _calculate_tag_query(tables, conditions, list(bug.tags))
-    # XXX gary 2011-03-03 bug 728818
-    # Once we no longer have structural subscriptions without filters in
-    # qastaging and production, _get_structural_subscription_filter_id_query
-    # can just return the query, and leave the candidates out of it.
-    return candidates, query
+    return _calculate_tag_query(conditions, list(bug.tags))
 
 
 def _calculate_bugtask_condition(query_arguments):
@@ -771,60 +736,80 @@
                             returned by get_structural_subscription_targets.
     """
     # This handles importance and status, which are per bugtask.
-    # What we do is loop through the collection of bugtask, target
-    # in query_arguments.  Each bugtask will have one or more
-    # targets that we have to check.  We figure out how to describe each
-    # target using the useful IStructuralSubscriptionTargetHelper
-    # adapter, which has a "join" attribute on it that tells us how
-    # to distinguish that target.  Once we have all of the target
-    # descriptions, we OR those together, and say that the filters
-    # for those targets must either have no importance or match the
-    # associated bugtask's importance; and have no status or match
-    # the bugtask's status.  Once we have looked at all of the
-    # bugtasks, we OR all of those per-bugtask target comparisons
-    # together, and we are done with the status and importance.
-    # The "outer_or_conditions" list holds the full clauses for each
-    # bugtask.
+    # The `query_arguments` collection has pairs of bugtask, target,
+    # where one bugtask may repeat, to be paired with multiple targets.
+    # We know it will not be empty, because we already escaped early in
+    # _get_structural_subscription_filter_id_query with "if not
+    # query_arguments: return None".
+    # This approach groups the queries around shared statuses and importances
+    # in order to address concerns like those raised in bug 731009.
+    # We begin be establishing that grouping. The `statuses` dict maps
+    # bugtask statuses to a mapping of bugtask importances to a list of
+    # targets. More concisely:
+    #   statuses map -> importances map -> list of targets
+    statuses = defaultdict(lambda: defaultdict(list))
+    for bugtask, target in query_arguments:
+        statuses[bugtask.status][bugtask.importance].append(target)
+    # Now that they are grouped, we will build our query.  If we only have
+    # a single bugtask target, the goal is to produce a query something
+    # like this:
+    #   And(
+    #     Or(filter's importance = bugtask's importance,
+    #        filter's importance is null),
+    #     And(
+    #       Or(filter's status = bugtask's status,
+    #          filter's status is null),
+    #       subscription matches target))
+    # If there are multiple bugtask targets, but the associated bugtasks
+    # share the same importance and status, then "subscription matches
+    # target" will become something like this:
+    #       Or(subscription matches target 1,
+    #          subscription matches target 2,
+    #          ...)
+    # Matching targets is done using the useful
+    # IStructuralSubscriptionTargetHelper adapter, which has a "join"
+    # attribute on it that tells us how to distinguish that target.
     outer_or_conditions = []
-    # The "or_target_conditions" list holds the clauses for each target,
-    # and is reset for each new bugtask.
-    or_target_conditions = []
-
-    def handle_bugtask_conditions(bugtask):
-        """Helper function for building status and importance clauses.
-
-        Call with the previous bugtask when the bugtask changes in
-        the iteration of query_arguments, and call with the last
-        bugtask when the iteration is complete."""
-        if or_target_conditions:
-            outer_or_conditions.append(
-                And(Or(*or_target_conditions),
-                    Or(BugSubscriptionFilterImportance.importance ==
-                       bugtask.importance,
-                       BugSubscriptionFilterImportance.importance == None),
-                    Or(BugSubscriptionFilterStatus.status == bugtask.status,
-                       BugSubscriptionFilterStatus.status == None)))
-            del or_target_conditions[:]
-    last_bugtask = None
-    for bugtask, target in query_arguments:
-        if last_bugtask is not bugtask:
-            handle_bugtask_conditions(last_bugtask)
-        last_bugtask = bugtask
-        or_target_conditions.append(
-            IStructuralSubscriptionTargetHelper(target).join)
-    # We know there will be at least one bugtask, because we already
-    # escaped early "if len(query_arguments) == 0".
-    handle_bugtask_conditions(bugtask)
+    for status, importances in statuses.items():
+        inner_or_conditions = []
+        for importance, targets in importances.items():
+            target_query = Or(
+                *[IStructuralSubscriptionTargetHelper(target).join
+                  for target in targets])
+            importance_query = Or(
+                BugSubscriptionFilterImportance.importance == importance,
+                BugSubscriptionFilterImportance.importance == None)
+            inner_or_conditions.append(And(target_query, importance_query))
+        status_query = Or(
+            BugSubscriptionFilterStatus.status == status,
+            BugSubscriptionFilterStatus.status == None)
+        outer_or_conditions.append(
+            And(status_query, Or(*inner_or_conditions)))
     return Or(*outer_or_conditions)
 
 
-def _calculate_tag_query(tables, conditions, tags):
+def _calculate_tag_query(conditions, tags):
     """Determine tag-related conditions and assemble a query.
 
-    :param tables: the tables and joins to use in the query.
     :param conditions: the other conditions that constrain the query.
     :param tags: the list of tags that the bug has.
     """
+    # These are tables and joins we will want.  We leave out the tag join
+    # because that needs to be added conditionally.
+    tables = [
+        StructuralSubscription,
+        Join(BugSubscriptionFilter,
+             BugSubscriptionFilter.structural_subscription_id ==
+             StructuralSubscription.id),
+        LeftJoin(BugSubscriptionFilterStatus,
+                 BugSubscriptionFilterStatus.filter_id ==
+                 BugSubscriptionFilter.id),
+        LeftJoin(BugSubscriptionFilterImportance,
+                 BugSubscriptionFilterImportance.filter_id ==
+                 BugSubscriptionFilter.id)]
+    tag_join = LeftJoin(
+        BugSubscriptionFilterTag,
+        BugSubscriptionFilterTag.filter_id == BugSubscriptionFilter.id)
     # If the bug has no tags, this is relatively easy. Otherwise, not so
     # much.
     if len(tags) == 0:
@@ -833,6 +818,7 @@
         # (BugSubscriptionFilter.include_any_tags), which we do with
         # the conditions.
         conditions.append(Not(BugSubscriptionFilter.include_any_tags))
+        tables.append(tag_join)
         return Select(
             BugSubscriptionFilter.id,
             tables=tables,
@@ -860,20 +846,22 @@
         # handle filters that include *all* of the filter's selected
         # tags (as determined by BugSubscriptionFilter.find_all_tags).
         # Every aspect of the unioned queries' WHERE clauses *other
-        # than tags* will need to be the same. We could try making a
-        # temporary table for the shared results, but that would
-        # involve another separate Postgres call, and I think that
-        # we've already gotten the big win by separating out the
-        # structural subscriptions into "candidates," above.
-        # When Storm supports the WITH statement, we should reconsider
-        # this (bug 729134).
+        # than tags* will need to be the same, and so we perform that
+        # separately, first.  When Storm supports the WITH statement
+        # (bug 729134), we can consider folding this back into a single
+        # query.
+        candidates = list(
+            IStore(BugSubscriptionFilter).using(*tables).find(
+                BugSubscriptionFilter.id, *conditions))
+        if not candidates:
+            return None
         # As mentioned, in this first SELECT we handle filters that
         # match any of the filter's tags.  This can be a relatively
         # straightforward query--we just need a bit more added to
         # our WHERE clause, and we don't need a GROUP BY/HAVING.
         first_select = Select(
             BugSubscriptionFilter.id,
-            tables=tables,
+            tables=[BugSubscriptionFilter, tag_join],
             where=And(
                 Or(# We want filters that proclaim they simply want any tags.
                    BugSubscriptionFilter.include_any_tags,
@@ -887,7 +875,7 @@
                               Not(In(BugSubscriptionFilterTag.tag, tags))),
                           # ...or if the filter does not specify any tags.
                           BugSubscriptionFilterTag.tag == None))),
-                *conditions))
+                In(BugSubscriptionFilter.id, candidates)))
         # We have our first clause.  Now we start on the second one:
         # handling filters that match *all* tags.
         # This second query will have a HAVING clause, which is where some
@@ -917,11 +905,12 @@
         # Now let's build the select itself.
         second_select = Select(
             BugSubscriptionFilter.id,
-            tables=tables,
+            tables=[BugSubscriptionFilter, tag_join],
             # Our WHERE clause is straightforward. We are simply
             # focusing on BugSubscriptionFilter.find_all_tags, when the
             # first SELECT did not consider it.
-            where=And(BugSubscriptionFilter.find_all_tags, *conditions),
+            where=And(BugSubscriptionFilter.find_all_tags,
+                      In(BugSubscriptionFilter.id, candidates)),
             # The GROUP BY collects the filters together.
             group_by=(BugSubscriptionFilter.id,),
             having=And(