← Back to team overview

launchpad-dev team mailing list archive

Re: fallout from bugsummary: ubuntu bug status changes failing

 

On Thu, Jun 9, 2011 at 10:44 PM, Robert Collins
<robertc@xxxxxxxxxxxxxxxxx> wrote:
> See bug 794802 for some of the gory details; in short bugsummary
> managed to spring a nasty surprise on us.
...
> At this point I've handed over the incident to Stuart, and am halting() myself.
>
> I hope to awaken to great news :)

So this is basically behind us - there is a cowboyed schema change
live that preserves performance - the branch for it to exist
officially will be landing on devel soon. Until that lands the tags
portlets for projects won't adjust (specifically closing all the bugs
for a tag won't remove that tag).

Some observations:
 - we had a 900 timeout spike over the period of degraded service.
Thats not too bad - its just over one every 2 minutes.
 - It took us 2.5 hours to realise there was a system problem; lower
latency OOPS reports are really important to reduce this.
 - We should have rolled back (by neutering the trigger functions) and
I would do that next time. (it was unclear if that was safe until we
got stub up, and by then we had a code change to substantially improve
things...)
 - The new tags portlet runs -fast-. Almost scary fast.
 - In a month more or less we'll have that speed for the numbers
portlet too. \o/.

-Rob


Follow ups

References