← Back to team overview

launchpad-dev team mailing list archive

Re: performance tuesday - timeout setting, to change or not, that is the question!

 

On Wed, Mar 2, 2011 at 11:19 PM, John Arbash Meinel
<john@xxxxxxxxxxxxxxxxx> wrote:
>> this our 52 worker threads are valiantly handling 165 concurrent
>> sessions coming through haproxy. This doesn't affect our DB load
>> hugely: - things queue in the appserver until one of the 4 threads
>> (current config) is idle and only then get through to the backend.
>> What it does mean is that /most/ requests are competing for the GIL
>> for all their logic.
>
> What is the status of moving all the appservers to single threaded so
> that you don't end up in GIL contention? Would this also include
> starting up more app servers per physical box?

Its in progress - we passed a key benchmark last night when we got the
two haproxies into active-passive mode rather than active-active -
which lets us limit more accurately (because active-active doesn't
share state). Yes, we'll bring up more appserver processes.



> I believe you have the general time-to-render info for all requests. As
> such, can't you mostly predict the effect of dropping the hard timeout?
> (How many queries are currently completing in 14s, but would not
> complete in 13s?) All of these seem really far away from a 9s, or the
> future 5s goal.

We can yes, but I don't have a ready-to-roll report to say exactly
what will happen. I do look at the 99% on the categories/candidate
timeout reports when doing this - they are pretty good predictors.

> Is there a better way to drive timeout fixes than forcing the hard
> timeout? Or is it that you expect dropping the hard timeout will cause
> the evil threads to die earlier, and thus actually speed up all the
> other ones...

A lower timeout:
 - helps us notice poorly performing new features earlier.
 - frees up resources that evil requests would otherwise use - not a
/huge/ effect as we have so few evil requests (0.01%)
 - gives users of requests that are in poor shape faster feedback that
its broken. (Contrast a fast fail-whale on twitter vs a 30 second hang
then an error).

-Rob



References