launchpad-dev team mailing list archive

Thread
Date
Re: tuesday - performance day!

To: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Robert Collins <robert.collins@xxxxxxxxxxxxx>
Date: Tue, 3 Aug 2010 22:02:03 +1200
In-reply-to: <AANLkTin4uu_XqBiK4Sw8ddfhd36E6uxqHaJ5zs7f+axa@mail.gmail.com>
Sender: robertc@xxxxxxxxxxxxxxxxx
On Tue, Aug 3, 2010 at 12:49 PM, Robert Collins
<robert.collins@xxxxxxxxxxxxx> wrote:
> Just a reminder - I'm spending today, all day, just poking at
> performance; I'm 100% interruptible for performance related questions
> / tasks / todos & gotchas.
>
> And if anyone wants to join in, please do so!
>
> Right now, I'm about to finish up a small backlog of emails, then look
> into why https://api.staging.launchpad.net/beta/~ubuntu-dev/participants
> takes 8 seconds.

So, I'm heading off to sleep, and I haven't got a mergable branch yet,
though I do have some small test cleanups.

Largely I've been on a voyage of discovery of a few bits of the
system, which is ok :).

I've also finally got my bug duplicate search improvements live - they
are now running on edge, and its looking - if I do say so myself -
pretty good.

https://bugs.edge.launchpad.net/ubuntu/+filebug-show-similar is the
worker page that is called when you click on 'next'
https://bugs.edge.launchpad.net/ubuntu/+filebug. So to see how the new
stuff performs, you can bring it up directly - e.g.
<https://bugs.edge.launchpad.net/ubuntu/+filebug-show-similar?title=lifeless%20is%20excited%20by%20the%20speed>
or whatever you like.
https://bugs.edge.launchpad.net/malone/+bug/607776 is the bug tracking
this, and if anyone has more feedback on the change, I'd love to hear
from you. The basic approach is to be a little less smart to buy us
room to revamp search properly.

But back to the API performance issue I was looking at - I thought I'd
note down the bits I looked at in case the places I'm looking, and how
I'm looking are either useful to others, or just crazy hard and there
is a better way of doing it. Also I wanted to say  a huge thank you to
Tim for spending some time pointing me at a similar issue solved in
the code api already.

I started with a report that
https://api.launchpad.net/beta/~ubuntu-dev/participants was slow - I
think I saw it in a oops report or something; the source doesn't
particularly matter.

I grabbed a profile off of staging. This required some bootstrapping
of that facility. Remember the excellent kcachegrind stuff Maris wrote
about? Well we can get that from staging, with LOSA cooperation. Its a
little manual at the moment, so its worth seeing if you can reproduce
things locally first.

To get a profile from staging, follow these basic steps:
 - grab a LOSA
 - ask them to set profile_requests: True in the staging config (on
staging) and bounce the appserver to make it take effect. I have a
branch that makes this trivial which I will get merged right after the
rollout; for now you need to apply the following:
=== modified file 'staging-lazr.conf'
--- staging-lazr.conf   2010-07-19 06:53:44 +0000
+++ staging-lazr.conf   2010-08-03 01:02:11 +0000
@@ -246,3 +246,9 @@
 [calculate_bug_heat]
 oops_prefix: CBHT
 error_dir: /srv/staging.launchpad.net/staging-logs/calculate_bug_heat
+
+[profiling]
+profile_requests: True
+profile_dir: /srv/staging.launchpad.net/staging-logs/profiling

With that applied, every request will generate a prof file
(datestamp-pageid-serial.prof) in
/srv/staging.launchpad.net/staging-logs/profiling. So, hit your page
up, and then
 - ask your friendly LOSA to revert the config and bounce the appserver again
 - ask them to rsync the logs/oops etc to devpad

You can then rsync the profile down locally - rsync -r
devpad.canonical.com:/srv/launchpad.net-logs/staging/asuka/profiling/
SOMEPATH -. To find the profile you want, look for the datestamp
matching your web request and the page id. This can be a bit fiddly
:(. There should be a oops comment in the page if you hit ctrl-U, but
API pages (which I was looking at) don't render that.

Anyhow, having found it (thanks Steve!) note that the name - e.g.
2010-08-03_02:04:43.893-MailingListApplication:MailingListAPIView-OOPS-1676S219-Dummy-4.prof
- has an OOPS code in it. You can look this up on
lp-oops.canonical.com, and correlate the SQL activity and the python
backtrace together.

So back to /participation, when we looked at this, the OOPS suggests
mostly nonsql time, but had 400 SQL queries, and the python trace
shows lots - lots! - of time in storm and in getattr's.

So what apparently happens here (this is where Tim helped out :)) is
that lazr.restful is introspecting *every declared attribute*; so we
have 'allmembers' which is a person property that does a (fairly
cheap) query, and then that cheap query is expanded item by item,
property-by-property. (Oh, and hint for finding the right method -
grep for 'exported_as.*URL_SUFFIX' - e.g. exported_as.*participation -
in the interfaces files.

Now, given Tim has run into this in code, it may be that there is a
general problem pattern we should look into, but we should also fix
this up. In the code API, what they do is a tuple query in storm to
get back all the needed information and then start yielding
pre-populated objects which already have the needed values looked up.
This is a bit manual and error prone but a sensible approach for now.

The immediate thing I wanted to solve was the query count, so I set
out to write a test that no more than N queries were made - this is
one of the test cleanups I've code - I've factored out the nice
QueryCounter from the pofile performance tests, and put it in
lp.testing.

Where I'm up to on this is putting a browser using unit test together
that uses this, and then looking into whether any users of
person.allmembers will be broken if it returns delegates rather than
the original objects. Oh, and I need to also see how to not break the
pagination stuff, as team participation gets pretty big on somethings
- and thats something I don't think the current code solution
achieves.

I hope this slightly rambling story is useful - its probably the case
that its all all documented already and I just missed it.

If not though, please do tell me where you would naively have looked
for the interesting bits of the story, and I'll put them there
(wherever there is) in a more permanent fashion.

-Rob
Follow ups

Re: tuesday - performance day!
From: Robert Collins, 2010-08-05
Re: tuesday - performance day!
From: Jonathan Lange, 2010-08-03
Re: tuesday - performance day!
From: Julian Edwards, 2010-08-03
References

tuesday - performance day!
From: Robert Collins, 2010-08-03