← Back to team overview

launchpad-dev team mailing list archive

9 second hard timeout (no new timeout bugs)

 

Of course, we have some really hard pages to fix, and many more that
timeout only a few times a day, but we've fixed enough pages to be
managably down to a 9 second cap.

This doesn't mean we can stop working on timeouts :) - but it does
mean, at least for a while, that I won't be moving the hard timeout
value down if doing so would add new timeout bugs. It is time to
consolidate and focus on the second half of the stretch goal Francis
set : 9 second timeout + no critical bugs. (1/3 of the critical bugs
are timeouts).

The longer term goal is still a 5 second timeout with 1 second 99th
percentile... and we had a discussion a few weeks back about setting
the timeout for *new* pages to 5 seconds straight away. Thats still
not totally settled, but I think its time we looked into how to make
that work. In the mean time the hard timeout default value can sit at
9 seconds. If we get to the point where it could be dropped another
second without adding critical bugs, I'll definitely do that - but
only if it won't be adding bugs :). (dropping it provides a backstop
against misbehaving pages, its an important overall thing to get it
low).

The following pages have timeout exceptions at the moment:
hard_timeout	default	0	9000
hard_timeout	pageid:BugTask:+create-question	12	20000
hard_timeout	pageid:Distribution:+bugs	4	10000
hard_timeout	pageid:Distribution:+bugtarget-portlet-tags-content	3	10000
hard_timeout	pageid:Distribution:EntryResource:searchTasks	5	10000
hard_timeout	pageid:Question:+index	18	11000
hard_timeout	pageid:RootObject:+login	1	20000

Question:+index because it takes a very long time before it does its
commit - even without mail spooling its a slow page that doesn't
improve with retries.
Ditto BugTask:+create-question
Distribution:+bugs because we have some difficult performance work to
do on search, and its not inside the time frames suitable for
maintenance squads - at least, as assessed so far.
The tags portlet should be temporary until we deploy the bugsumary table.
pageid:Distribution:EntryResource:searchTasks seems to be driven by a
script - perhaps arsenal - but its getting into offsets of*thousands*
in the DB : we really need to address the batching logic.
Finally, RootObject:+login is exempted because we're running into SSO
backend delays which we have little visibility into as a team - there
is a bug open on canonical-identity-provider about performance, and
I've volunteered our collective knowledge if the ISD team have any
trouble analysing how or why the thing is slow - we've all learnt a
lot about addressing performance in the last 10 months, so please feel
free to share :)

-Rob


Follow ups