← Back to team overview

launchpad-dev team mailing list archive

Re: Updated policy around cronspam, oops reporting etc

 

On Thu, Sep 15, 2011 at 5:50 AM, Martin Pool <mbp@xxxxxxxxxxxxx> wrote:
> On 15 September 2011 06:49, Robert Collins <robertc@xxxxxxxxxxxxxxxxx> wrote:
>> We recently had a situation where we got overwhelmed with 'noise' on
>> our cron output: things that are meant to be 'silent unless things go
>> wrong' started outputing significant amounts of email. This
>> overwhelmed the folk that track the cron output and a real issue was
>> missed (the librarian-gc script was in crisis).
>
> One thing that trapped me before is: it's not utterly obvious what
> code changes will cause this noise in production, because it depends
> on configuration rules that developers can't see.  I naively added
> "warning" level logging about slightly-bad situations to some mail
> code and it turns out that sets off a big red light, whereas "debug"
> and "info" are fine (iirc).

I go by the rule of thumb that WARNING and above needs a human to do
something, even if that is to massage data or fix a script to stop
generating false positives. WARNING is 'This is something to fix',
ERROR is 'I must fix this. Something is failing.', CRITICAL/FATAL is
'I must fix this because the script is completely failing'.

I think a lot of the noise comes from emitting WARNING and ERROR
messages in situations where the script is perfectly capable of
continuing ('WARNING: Optional header missing, but I don't really care
as we expect that and defined procedure is to use this default',
'ERROR: Can't connect to the remote 3rd party service, but this is
normal because they don't guarantee 4 nines uptime so we don't really
care').

We might need some sort of persistent state, as a remove 3rd party
service being down is normal (INFO or DEBUG) but if it has been
failing consistently for 30 days it is WARNING since someone should
probably garden the old data. But I guess in most cases it is better
to be self gardening.


/me hopes there isn't a policy lurking somewhere that contradicts his
rule of thumb

-- 
Stuart Bishop <stuart@xxxxxxxxxxxxxxxx>
http://www.stuartbishop.net/


References