← Back to team overview

launchpad-dev team mailing list archive

Updated policy around cronspam, oops reporting etc

 

We recently had a situation where we got overwhelmed with 'noise' on
our cron output: things that are meant to be 'silent unless things go
wrong' started outputing significant amounts of email. This
overwhelmed the folk that track the cron output and a real issue was
missed (the librarian-gc script was in crisis).

During the team lead meeting we discussed this and I've clarified our
policies with the outcome: Things that *support* our identification of
production issues are essential to our day to day operations. Any
[significant] disruption to them is now an immediate operational
incident.

I don't see this as an actual change, rather a formalisation of the
prioritisation many folk have had in the past, but formalising it
gives *explicit* support to anyone that notices the issue and needs to
get the ball rolling.

I've updated the various docs I could see that were relevant:
https://dev.launchpad.net/BugTriage
https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy
https://wiki.canonical.com/Launchpad/PolicyandProcess/DefinitionofCriticalPolicy
(sorry, internal only)

I'd love any feedback on clarity - or whether this is a crazy thing to
do :P - that you might have.

-Rob


Follow ups