launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07907
Updated policy around cronspam, oops reporting etc
We recently had a situation where we got overwhelmed with 'noise' on
our cron output: things that are meant to be 'silent unless things go
wrong' started outputing significant amounts of email. This
overwhelmed the folk that track the cron output and a real issue was
missed (the librarian-gc script was in crisis).
During the team lead meeting we discussed this and I've clarified our
policies with the outcome: Things that *support* our identification of
production issues are essential to our day to day operations. Any
[significant] disruption to them is now an immediate operational
incident.
I don't see this as an actual change, rather a formalisation of the
prioritisation many folk have had in the past, but formalising it
gives *explicit* support to anyone that notices the issue and needs to
get the ball rolling.
I've updated the various docs I could see that were relevant:
https://dev.launchpad.net/BugTriage
https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy
https://wiki.canonical.com/Launchpad/PolicyandProcess/DefinitionofCriticalPolicy
(sorry, internal only)
I'd love any feedback on clarity - or whether this is a crazy thing to
do :P - that you might have.
-Rob
Follow ups