← Back to team overview

launchpad-dev team mailing list archive

Re: tuesday - performance day!

 

And to bring it all home. Last week I started looking at a
high-frequency timeout oops, and have been seeking input on all the
mistakes I make as I try to fix it :)

On Thu, Aug 5, 2010 at 10:22 PM, Robert Collins
<robert.collins@xxxxxxxxxxxxx> wrote:
...> So this puts me back at looking into the sql count for /participation:
>
>> So back to /participation, when we looked at this, the OOPS suggests
>> mostly nonsql time, but had 400 SQL queries, and the python trace
>> shows lots - lots! - of time in storm and in getattr's.
>
>> So what apparently happens here (this is where Tim helped out :)) is
>> that lazr.restful is introspecting *every declared attribute*; so we
>> have 'allmembers' which is a person property that does a (fairly
>> cheap) query, and then that cheap query is expanded item by item,
>> property-by-property. (Oh, and hint for finding the right method -
>> grep for 'exported_as.*URL_SUFFIX' - e.g. exported_as.*participation -
>> in the interfaces files.
>
> For 2 people in one team, this API takes 31 separate round trips to
> the DB, and 8 more per person added - its pretty easy to see why it
> scales up to thousands of round trips pretty quickly.
>
> Now, I'd like to ask for some advice. There seem to be three ways of
> fixing this:
>  - (somehow - I don't know the magic) tell lazr.restful to skip some
> attributes (e.g. karma, coc_signed etc) for representations.
> https://bugs.edge.launchpad.net/launchpad-foundations/+bug/251284 -
> makes note of things being removed from representations, but I don't
> know whats involved in doing that (yet). Or I may be misunderstanding
> and actually its an all-or-nothing solution. Leonard, if you could
> comment here (or there) and help enlighten me, I'd really love that :)

I'd still love this input! However I couldn't tell what version we
started exporting the attributes in. I've filed
https://bugs.edge.launchpad.net/launchpad-foundations/+bug/615244
about that.

>  - we can build a cached decorator like code does - need to check if
> any users of this will be broken, and how does it interact with
> mutation operations

I decided not to do this, instead I used cachedproperty and made a
common code path that can determine many properties from a single
query. Similar to prejoin but more capable, and (conversely) more
risky, because the properties won't update in the 'natural' manner.
However, we already have many cachedproperty instances, for exactly
this reason.

I didn't do this option:
>  - we can fix the main code path to grab all the stuff needed (like
> the cached decorator option, but in the model) - this may help
> +participation too, which seems to have a similar, high, query count.

because +participation has a different (though related) problem, and
making everything that looks at allmembers bring back all the related
tables is wasteful for the database, so not a good thing to do.

> is there any advice on whats likely to work best - what has been
> working well for us so far?

Also love to hear horror stories too - things I should avoid :)

https://code.edge.launchpad.net/~lifeless/launchpad/registry/+merge/32067
has my branch which cuts the queries for this API call down to a
constant 11 (in my adhoc testing). I'm doubtful that this will be safe
to land without some focused QA, but we'll be QA'ing edge very soon,
so that should be fine - I expect we can get this out and live in a
week or so tops.

That makes a 2 week VSM for this particular fix, OTOH coding isn't my
top responsibility, and this isn't part of the main UI, so I don't
think I'm slowing things down too much.

-Rob



References