launchpad-dev team mailing list archive

Thread
Date

Re: performance tuesday - timeout setting, to change or not, that is the question!

To: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>
From: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
Date: Thu, 3 Mar 2011 07:43:20 +1300
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4D6E1994.8070905@arbash-meinel.com>

On Wed, Mar 2, 2011 at 11:19 PM, John Arbash Meinel
<john@xxxxxxxxxxxxxxxxx> wrote:
>> this our 52 worker threads are valiantly handling 165 concurrent
>> sessions coming through haproxy. This doesn't affect our DB load
>> hugely: - things queue in the appserver until one of the 4 threads
>> (current config) is idle and only then get through to the backend.
>> What it does mean is that /most/ requests are competing for the GIL
>> for all their logic.
>
> What is the status of moving all the appservers to single threaded so
> that you don't end up in GIL contention? Would this also include
> starting up more app servers per physical box?

Its in progress - we passed a key benchmark last night when we got the
two haproxies into active-passive mode rather than active-active -
which lets us limit more accurately (because active-active doesn't
share state). Yes, we'll bring up more appserver processes.



> I believe you have the general time-to-render info for all requests. As
> such, can't you mostly predict the effect of dropping the hard timeout?
> (How many queries are currently completing in 14s, but would not
> complete in 13s?) All of these seem really far away from a 9s, or the
> future 5s goal.

We can yes, but I don't have a ready-to-roll report to say exactly
what will happen. I do look at the 99% on the categories/candidate
timeout reports when doing this - they are pretty good predictors.

> Is there a better way to drive timeout fixes than forcing the hard
> timeout? Or is it that you expect dropping the hard timeout will cause
> the evil threads to die earlier, and thus actually speed up all the
> other ones...

A lower timeout:
 - helps us notice poorly performing new features earlier.
 - frees up resources that evil requests would otherwise use - not a
/huge/ effect as we have so few evil requests (0.01%)
 - gives users of requests that are in poor shape faster feedback that
its broken. (Contrast a fast fail-whale on twitter vs a 30 second hang
then an error).

-Rob

References

performance tuesday - timeout setting, to change or not, that is the question!
From: Robert Collins, 2011-03-02
Re: performance tuesday - timeout setting, to change or not, that is the question!
From: John Arbash Meinel, 2011-03-02