← Back to team overview

launchpad-dev team mailing list archive

Re: performance tuesday - episode three: milestones

 

So milestones.

This one was a little torture-like for me, due to a bug in the
userbrowser: with a wrong password it silently did something other
than retrieving the right page. :(

Other than that few-hour-distraction, I had a great performance tuesday.

My cachedproperty branch and registry branches have landed. This means:
 - model caching in a (reasonably) safe way is now available to everyone
 - The /participation API should take ~11 queries to run :). Always
(modulo missing a case, of course - certainly the badly behaved query
we saw should be addressed).
 - there is a new storm release soon and we'll want it - I found a new
storm bug in my registry branch. __storm_loaded__ was being called
before objects are initialised, and this makes the Person 'isTeam'
related _init hook cause a separate *single* db lookup - per person
loaded when a cache is hit in this way. Hopefully this isn't affecting
prod, but - we never know.

So I started with
https://bugs.edge.launchpad.net/launchpad-registry/+bug/447418 which
was pretty clearly death-by-sql.

Most of the SQL time was doing bug subscription lookups. The root
cause of that is:
for subscription in self.subscriptions:
     if user.inTeam(subscription.person):
        return True

This is a problem: it looks like simple python code, but its dealing
with hundreds of objects and causing DB access on simple attribute
access to bugs.

So, while we can make the code faster (see
https://bugs.edge.launchpad.net/malone/+bug/619039) it would still be
one-query-per-bug, which is going to scale very poorly.

cachedproperty to the rescue. The patch is below (minus tests and
refactorings to make it have more impact).

Its simple conceptually:
 - use a DecoratedResultSet in bugtask queries
 - cache the user whom we queried-on-behalf-of-if-we're-returning-private-bugs
 - use that to satisfy userCanView lookups in bug permission checks.

Some minor details about the arc I took, in case its of use in seeing patterns.

I grabbed a fresh oops straight away (which was good - the queries
counts alone were quite different)

I ignored a python profile, because the sql time was so clearly bonkers.

The oops told me that the top repeated query is 90% of sql time, and
its a single-bug lookup for subscriptions

I added a test for the query count with 1 and then 3 private bugs, to
check that it didn't increase-per-bug

In passing, I saw logs of custom-join code like Person._all_members
and BugTaskSet.query that wasn't reused - it was very very similar in
many places. e.g. in mailinglist.py

I found it very trying to get a reproducable test case: once I had one
it was about 1 hours work to put this fix together with reasonable
confidence that it will work. Specific things that gave me trouble
were:
 - the test browser password issue
 - some lack of familiarity with various test helpers - wouldn't
affect any of you :)
 - not knowing what was triggering all the subscription lookups from
the oops - I mean, I had one, from the faiing query, but for whatever
reason (I claim ECOFFEE) I didn't twig that it was going to be totally
representative until rather late in the process. Specifically I
assumed stuff in security.py and authorization.py would be a) generic
and b) fast.

But its done. \o/.

I don't know how many queries we'll see in prod with this, but I'm
pretty hopeful it would be rather few.

=== modified file 'lib/canonical/launchpad/utilities/personroles.py'
--- lib/canonical/launchpad/utilities/personroles.py    2010-01-14
11:28:51 +0000
+++ lib/canonical/launchpad/utilities/personroles.py    2010-08-17
09:10:19 +0000
@@ -35,6 +35,10 @@
         except AttributeError:
             raise AttributeError(errortext)

+    @property
+    def id(self):
+        return self.person.id
+
=== modified file 'lib/lp/bugs/model/bug.py'
--- lib/lp/bugs/model/bug.py    2010-08-07 14:54:40 +0000
+++ lib/lp/bugs/model/bug.py    2010-08-17 09:09:33 +0000
@@ -42,6 +42,10 @@
     ObjectCreatedEvent, ObjectDeletedEvent, ObjectModifiedEvent)
 from lazr.lifecycle.snapshot import Snapshot

+from canonical.cachedproperty import (
+    cachedproperty,
+    clear_property,
+    )
 from canonical.config import config
 from canonical.database.constants import UTC_NOW
 from canonical.database.datetimecol import UtcDateTimeCol
@@ -555,6 +559,7 @@
                 # disabled see the change.
                 store.flush()
                 self.updateHeat()
+                clear_property(self, '_cached_viewers')
                 return

     def unsubscribeFromDupes(self, person, unsubscribed_by):
@@ -1547,21 +1552,32 @@
             self, self.messages[comment_number])
         bug_message.visible = visible

+    @cachedproperty('_cached_viewers')
+    def _known_viewers(self):
+        """A dict of of known persons able to view this bug."""
+        return set()
+
     def userCanView(self, user):
-        """See `IBug`."""
+        """See `IBug`.
+
+        Note that Editing is also controlled by this check,
+        because we permit editing of any bug one can see.
+        """
+        if user.id in self._known_viewers:
+            return True
         admins = getUtility(ILaunchpadCelebrities).admin
         if not self.private:
             # This is a public bug.
             return True
-        elif user.inTeam(admins):
+        elif user.in_admin:
             # Admins can view all bugs.
             return True
         else:
             # This is a private bug. Only explicit subscribers may view it.
             for subscription in self.subscriptions:
                 if user.inTeam(subscription.person):
+                    self._known_viewers.add(user.id)
                     return True
-
         return False

     def linkHWSubmission(self, submission):

=== modified file 'lib/lp/bugs/model/bugtask.py'
--- lib/lp/bugs/model/bugtask.py        2010-08-03 08:49:19 +0000
+++ lib/lp/bugs/model/bugtask.py        2010-08-17 08:37:42 +0000
@@ -42,7 +42,7 @@
 from lazr.enum import DBItem

 from canonical.config import config
-
+from canonical.cachedproperty import cache_property
 from canonical.database.sqlbase import (
     SQLBase, block_implicit_flushes, convert_storm_clause_to_string, cursor,
     quote, quote_like, sqlvalues)
@@ -51,6 +51,7 @@
 from canonical.database.nl_search import nl_phrase_search
 from canonical.database.enumcol import EnumCol

+from canonical.launchpad.components.decoratedresultset import
DecoratedResultSet
 from canonical.launchpad.interfaces.lpstorm import IStore
 from canonical.launchpad.webapp.interfaces import ILaunchBag

@@ -1534,11 +1535,15 @@
         """Build and return an SQL query with the given parameters.

         Also return the clauseTables and orderBy for the generated query.
+
+        :return: A query, the tables to query, ordering expression and a
+            decorator to call on each returned row.
         """
         assert isinstance(params, BugTaskSearchParams)

         extra_clauses = ['Bug.id = BugTask.bug']
         clauseTables = ['BugTask', 'Bug']
+        decorators = []

         # These arguments can be processed in a loop without any other
         # special handling.
@@ -1814,6 +1819,11 @@
         clause = get_bug_privacy_filter(params.user)
         if clause:
             extra_clauses.append(clause)
+            userid = params.user.id
+            def cache_user_can_view_bug(bugtask):
+                cache_property(bugtask.bug, '_cached_viewers', set([userid]))
+                return bugtask
+            decorators.append(cache_user_can_view_bug)

         hw_clause = self._buildHardwareRelatedClause(params)
         if hw_clause is not None:
@@ -1842,7 +1852,15 @@
         orderby_arg = self._processOrderBy(params)

         query = " AND ".join(extra_clauses)
-        return query, clauseTables, orderby_arg
+
+        if not decorators:
+            decorator = lambda x:x
+        else:
+            def decorator(obj):
+                for decor in decorators:
+                    obj = decor(obj)
+                return obj
+        return query, clauseTables, orderby_arg, decorator

     def _buildUpstreamClause(self, params):
         """Return an clause for returning upstream data if the data exists.
@@ -2074,22 +2092,23 @@
         """See `IBugTaskSet`."""
         store_selector = getUtility(IStoreSelector)
         store = store_selector.get(MAIN_STORE, SLAVE_FLAVOR)
-        query, clauseTables, orderby = self.buildQuery(params)
+        query, clauseTables, orderby, decorator = self.buildQuery(params)
         if len(args) == 0:
             # Do normal prejoins, if we don't have to do any UNION
             # queries.  Prejoins don't work well with UNION, and the way
             # we prefetch objects without prejoins cause problems with
             # COUNT(*) queries, which get inefficient.
-            return BugTask.select(
+            resultset = BugTask.select(
                 query, clauseTables=clauseTables, orderBy=orderby,
                 prejoins=['product', 'sourcepackagename'],
                 prejoinClauseTables=['Bug'])
+            return DecoratedResultSet(resultset, result_decorator=decorator)

         bugtask_fti = SQL('BugTask.fti')
         result = store.find((BugTask, bugtask_fti), query,
                             AutoTables(SQL("1=1"), clauseTables))
         for arg in args:
-            query, clauseTables, dummy = self.buildQuery(arg)
+            query, clauseTables, dummy, decorator = self.buildQuery(arg)
             result = result.union(
                 store.find((BugTask, bugtask_fti), query,
                            AutoTables(SQL("1=1"), clauseTables)))
@@ -2108,7 +2127,7 @@
             (BugTask, Bug, Product, SourcePackageName))
         bugtasks = SQLObjectResultSet(BugTask, orderBy=orderby,
                                       prepared_result_set=result)
-        return bugtasks
+        return DecoratedResultSet(bugtasks, result_decorator=decorator)

     def getAssignedMilestonesFromSearch(self, search_results):
         """See `IBugTaskSet`."""



Follow ups

References