← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad

 

Raphaël Badin has proposed merging lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #867941 in Launchpad itself: "+activereviews times out"
  https://bugs.launchpad.net/launchpad/+bug/867941

For more details, see:
https://code.launchpad.net/~rvb/launchpad/activereviews-bug-867941-hugequery/+merge/82650

= Summary =
This branch aims to improve the performance of code.lp.net/~user/+activereviews.  Apart from the many small queries issued by this page (this should be fixed by lp:~rvb/launchpad/activereviews-bug-867941) the main problem with this page is the huge query that is issued when the number of public branches for a user is large.  For instance, the page https://code.launchpad.net/~ubuntu-branches/+activereviews issues in production a 4.5s query (https://pastebin.canonical.com/55779/).  The approach here is to refactor the query to use a CTE to materialize once and for all the list of the public and private-but-visible-for-the-user branch ids instead of repeating this expensive subquery (in the paste above, the expensive subquery is present lines 10-23, 24-37, 49-62 and 66-79).  It would be an even better option to not even materialize this list but the code inside ./lib/lp/code/model/branchcollection.py makes this difficult to achieve that without a huge refactoring.  I decided to do something simpler that should still improve things significantly.

= Details =
The main change here is the refactoring of _getBranchVisibilityExpression to use a CTE instead of a subquery. The new version of _getBranchVisibilityExpression populates self._with_dict which will be use to build the CTE. We use a dict that is then transformed into a proper "with" storm expression so that many calls to _addVisibleBranchesCTE will not create many declarations of the same CTE.
Another trick is that getMergeProposalsForPerson uses getMergeProposals and getMergeProposalsForReviewer and then uses an union of the results.  To avoid having the CTE declared in both queries, I had to restructure the code to a) be able to get BMP ids — hence the introduction of the parameter return_ids) b) be able to specify at which level the CTE must be defined — hence the introduction of the parameter include_with (getMergeProposals or getMergeProposalsForReviewer can also be used directly by external code).

= Tests =
I admit I've not created any test for this even if I've tested it manually to make sure that the query issued is of the right form.  I don't think a test to assert that the query is of the right form would be appropriate but maybe a reviewer will have an better idea.

= Q/A =
This branch should not change the behavior of the +activereview page but it's important to make sure that this really improves the performance before we release it.

Here are a few numbers on how the huge original query performs:
qastaging: [1860.0, 6737.0, 1846.0, 1953.0, 1860.0, 1677.0, 1734.0, 2161.0, 2087.0, 1948.0, 2007.0, 1750.0]
prod: [4517.0, 4786.0, 4786.0, 4517.0, 4502.0, 4502.0, 4502.0, 4350.0]
-- 
https://code.launchpad.net/~rvb/launchpad/activereviews-bug-867941-hugequery/+merge/82650
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~rvb/launchpad/activereviews-bug-867941-hugequery into lp:launchpad.
=== modified file 'lib/lp/code/model/branchcollection.py'
--- lib/lp/code/model/branchcollection.py	2011-11-01 08:17:57 +0000
+++ lib/lp/code/model/branchcollection.py	2011-11-18 09:23:03 +0000
@@ -127,6 +127,16 @@
         if exclude_from_search is None:
             exclude_from_search = []
         self._exclude_from_search = exclude_from_search
+        self._with_dict = {}
+
+    @property
+    def store_with_with(self):
+        store = self.store
+        if len(self._with_dict) != 0:
+            with_expr = [With(name, expr)
+                for (name, expr) in self._with_dict.items()]
+            store = store.with_(with_expr)
+        return store
 
     def count(self):
         """See `IBranchCollection`."""
@@ -264,7 +274,8 @@
             self._tables.values() + self._asymmetric_tables.values())
         tables = [Branch] + list(all_tables)
         expressions = self._getBranchExpressions()
-        resultset = self.store.using(*tables).find(Branch, *expressions)
+        resultset = self.store_with_with.using(
+            *tables).find(Branch, *expressions)
         if not eager_load:
             return resultset
 
@@ -285,6 +296,15 @@
                           target_branch=None, merged_revnos=None,
                           eager_load=False):
         """See `IBranchCollection`."""
+        return self._getMergeProposals(
+            statuses=statuses, for_branches=for_branches,
+            target_branch=target_branch, merged_revnos=merged_revnos,
+            eager_load=eager_load)
+
+    def _getMergeProposals(self, statuses=None, for_branches=None,
+                          target_branch=None, merged_revnos=None,
+                          eager_load=False, return_ids=False,
+                          include_with=True):
         if for_branches is not None and not for_branches:
             # We have an empty branches list, so we can shortcut.
             return EmptyResultSet()
@@ -296,15 +316,19 @@
             target_branch is not None or
             merged_revnos is not None):
             return self._naiveGetMergeProposals(statuses, for_branches,
-                target_branch, merged_revnos, eager_load)
+                target_branch, merged_revnos, eager_load,
+                return_ids=return_ids, include_with=include_with)
         else:
             # When examining merge proposals in a scope, this is a moderately
             # effective set of constrained queries. It is not effective when
             # unscoped or when tight constraints on branches are present.
-            return self._scopedGetMergeProposals(statuses)
+            return self._scopedGetMergeProposals(
+                statuses, return_ids=return_ids)
 
     def _naiveGetMergeProposals(self, statuses=None, for_branches=None,
-        target_branch=None, merged_revnos=None, eager_load=False):
+                                target_branch=None, merged_revnos=None,
+                                eager_load=False, return_ids=False,
+                                include_with=True):
 
         def do_eager_load(rows):
             branch_ids = set()
@@ -361,14 +385,20 @@
         if statuses is not None:
             expressions.append(
                 BranchMergeProposal.queue_status.is_in(statuses))
-        resultset = self.store.using(*tables).find(
-            BranchMergeProposal, *expressions)
+        if include_with:
+            store = self.store_with_with
+        else:
+            store = self.store
+        returned_obj = (
+            BranchMergeProposal.id if return_ids else BranchMergeProposal)
+        resultset = store.using(*tables).find(
+            returned_obj, *expressions)
         if not eager_load:
             return resultset
         else:
             return DecoratedResultSet(resultset, pre_iter_hook=do_eager_load)
 
-    def _scopedGetMergeProposals(self, statuses):
+    def _scopedGetMergeProposals(self, statuses, return_ids=False):
         scope_tables = [Branch] + self._tables.values()
         scope_expressions = self._branch_filter_expressions
         select = self.store.using(*scope_tables).find(
@@ -391,19 +421,30 @@
         if statuses is not None:
             expressions.append(
                 BranchMergeProposal.queue_status.is_in(statuses))
+        returned_obj = (
+            BranchMergeProposal.id if return_ids else BranchMergeProposal)
         return self.store.with_(with_expr).using(*tables).find(
-            BranchMergeProposal, *expressions)
+            returned_obj, *expressions)
 
     def getMergeProposalsForPerson(self, person, status=None):
         """See `IBranchCollection`."""
         # We want to limit the proposals to those where the source branch is
         # limited by the defined collection.
-        owned = self.ownedBy(person).getMergeProposals(status)
-        reviewing = self.getMergeProposalsForReviewer(person, status)
-        return owned.union(reviewing)
+        owned = self.ownedBy(person)._getMergeProposals(
+            status, include_with=False, return_ids=True)
+        reviewing = self._getMergeProposalsForReviewer(
+            person, status, include_with=False, return_ids=True)
+        result = owned.union(reviewing)
+        return self.store_with_with.using([BranchMergeProposal]).find(
+            BranchMergeProposal,
+            In(BranchMergeProposal.id, result._get_select()))
 
     def getMergeProposalsForReviewer(self, reviewer, status=None):
         """See `IBranchCollection`."""
+        return self._getMergeProposalsForReviewer(reviewer, status=status)
+
+    def _getMergeProposalsForReviewer(self, reviewer, status=None,
+                                     include_with=True, return_ids=False):
         tables = [
             BranchMergeProposal,
             Join(CodeReviewVoteReference,
@@ -423,8 +464,14 @@
         if status is not None:
             expressions.append(
                 BranchMergeProposal.queue_status.is_in(status))
-        proposals = self.store.using(*tables).find(
-            BranchMergeProposal, *expressions)
+        if include_with:
+            store = self.store_with_with
+        else:
+            store = self.store
+        returned_obj = (
+            BranchMergeProposal.id if return_ids else BranchMergeProposal)
+        proposals = store.using(*tables).find(
+            returned_obj, *expressions)
         # Apply sorting here as we can't do it in the browser code.  We need
         # to think carefully about the best places to do this, but not here
         # nor now.
@@ -826,21 +873,29 @@
                        Branch.transitively_private == True)))
         return private_branches
 
+    def _addVisibleBranchesCTE(self):
+        """Add the 'visible_branches' CTE to self._with_dict.
+
+        """
+        public_branches = Branch.transitively_private == False
+        if self._private_branch_ids is not None:
+            branches = Or(
+                public_branches,
+                Branch.id.is_in(self._private_branch_ids))
+        else:
+            branches = public_branches
+        branches_select = self.store.using(
+            [Branch]).find(Branch.id, branches)._get_select()
+        self._with_dict.update({'visible_branches': branches_select})
+
     def _getBranchVisibilityExpression(self, branch_class=Branch):
         """Return the where clauses for visibility.
 
         :param branch_class: The Branch class to use - permits using
             ClassAliases.
         """
-        public_branches = branch_class.transitively_private == False
-        if self._private_branch_ids is None:
-            # Public only.
-            return [public_branches]
-        else:
-            public_or_private = Or(
-                public_branches,
-                branch_class.id.is_in(self._private_branch_ids))
-            return [public_or_private]
+        self._addVisibleBranchesCTE()
+        return [branch_class.id.is_in(SQL("SELECT id FROM visible_branches"))]
 
     def _getCandidateBranchesWith(self):
         """Return WITH clauses defining candidate branches.