← Back to team overview

launchpad-dev team mailing list archive

mass bug filing from timeout-candidates report

 

In case anyone has just had a heartattack from all the timeout bugs
I've filed ... read on :)

We have a top timeout report that we get daily. Sadly, there are a few
problems with this - its noisy - very very noisy. Things are grouped
badly so we see unrelated issues clumped together, and the same issue
will be shown many different times. There are bugs open to fix that,
but the underlying theory is a bit suspect [for the use case of 'find
things that timeout to fix'.

Stuarts new page performance report provides a different set of data,
one that is very useful for determining both things timing out and
things likely to start timing out - but without the detail needed to
fix things (which the daily oops reports give).

So, i've run through the top-list of the page performance report (I
attached a copy of one the other day), and ensured there were bugs for
all of the high probability pages in the report. That is, if the page
regularly (>1%) exceeds our timeout threshold on edge or production,
I've made a bug for it.

My goal in doing this was to provide a complete view of the
danger-pages : rather than facing a never-ending pile of work which
bubbles up as we fix things in the daily oops report, capturing the
full set lets us sensibly burn down all the timeouts affecting us
today and get in a solid position for lowering our back-stop timeout
configuration. It also lets folk like me that like to pick a timeout
to fix to be working on a somewhat pre-massaged dataset.

And the good news is that the total count is *only* 66 bugs -
https://bugs.edge.launchpad.net/launchpad-project/+bugs?field.tag=timeout.

I honestly expect *nearly all* timeout reports we'll get until we
change the timeout again will be either one of these bugs or a
transient condition (like running on reduced DB server capacity during
node upgrades to lucid).

-Rob