← Back to team overview

launchpad-dev team mailing list archive

Performance Tuesday, high diving into BugTask:+index

 

So we've got continual issues with BugTask:+index: the primary view of a bug.

This is a very busy page, with a lot of related data: tasks,
nominations, duplicates, affected users, patches, attachments,
activity, milestones, targets, and comments.

As usual, I'm going to give a bit of a how-i-did-this, in the hope
that that is useful to @ least one reader :)

I started with lib/lp/bugs/browser/tests/test_bugtask.py where I had
earlier added a scaling test for view initialisation.

I converted that test to be a bit broader: to test the scaling of the
entire bug page as subcribers are added.

And then I has a small heartattack.

110 queries, for a simple single bugtask page with an owner and a
different user viewing it.

I decided to do a spike: just go straight at the problem, not worrying
too much about fallout; but not being rash either.

As I am just refactoring (har) I also largely ignored tests : There is
lots of test coverage of this facility, though little that makes
performance assessments.

The basic loop was:
 - run the test : LP_DEBUG_SQL_EXTRA=1 bin/test -vvt
tests.test_bugtask.TestBugTaskView 2>&1  | less
 - start at the top and search for getUserBrowser (to find where the meat starts
 - find the first query, and read the backtrace to figure out where it
comes from, and whether its ok.
 - if it wasn't, fix it. (e.g. to use is_empty rather than retrieving
all milestones)
 - iterate

Amongst the issues I found were:
 - COUNT(*) of duplicate bugs (when there is a denormalised field we can use)
 - we retrieved every attachment three times (now 2)
 - we retrieved all questions linked to the bug twice (now once)
 - and many similar issues.

I've gotten it down to 64 queries, which will hopefully turn the 2000
query bugtask:+index pages on lpnet into < 1000 queries - maybe, if
we're lucky, < 200.

Tools I used to reduce things:
 - liberal use of cachedproperty - if we're retrieving something and
accessing it from 2 places, don't access it the second time
 - helper functions to derive multiple outputs with a single call (and
then cachedproperty in front of them)
 - switch to more direct functions that make better queries
 - get rid of some listifications in passing.

JML was kind enough to review, and so I'm off to fixup the things he
found, and then toss it at ec2 to find out what incidental damage I
did.

Gnight,
Rob



Follow ups