← Back to team overview

nagios-charmers team mailing list archive

[Bug 1906321] Re: We should allow tuning of host/service notification_interval

 

This charm is no longer being actively maintained. Please consider using the new Canonical Observability Stack instead.
(https://charmhub.io/topics/canonical-observability-stack)
I will close this feature request

** Changed in: charm-nagios
       Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Nagios
Charm developers, which is subscribed to Nagios Charm.
https://bugs.launchpad.net/bugs/1906321

Title:
  We should allow tuning of host/service notification_interval

Status in Nagios Charm:
  Won't Fix

Bug description:
  I've found through testing that it seems current Nagios on Bionic does
  not re-notify alerts after downtimes if they had already alerted prior
  to the downtime.  While Nagios does have a "DOWNTIMEEND" notification
  upon a downtime completing, it appears to be different than what we
  need - we need alerts which are still in an error state to re-alert.

  The simplest way I can see to do this is by setting the
  notification_interval in the base host/service configs
  (/etc/nagios3/conf.d/generic-{host,service}_nagios2.cfg) from 0 to
  some other value, e.g. 10 or 20.  This assumes the nagios default
  interval_length of 60 seconds, meaning those would be 10 or 20 minute
  retry intervals.

  This may take some nuance to do this in a sane way.

  The main use case for performing the above is for when PagerDuty
  integration is in use.  Per testing, repeat notifications from Nagios
  to PagerDuty does not appear to create additional PagerDuty events
  when one already exists for the host/service in question.  This is
  true even when events are snoozed in PagerDuty.  Notifications also
  aren't sent during downtimes or via Nagios-side "ack"s.  The key
  change would be that when a downtime expires or when a nagios-side ack
  is un-acked, and if the event in PagerDuty was marked as resolved
  during that downtime/ack, then re-notification would cause a new event
  to be made in PagerDuty, largely mitigating "leakage" of something
  continuing to be a problem in Nagios but not hitting PagerDuty because
  it had been marked "resolved" at some point in the past.

  The key weakness I can see with this approach is: email notification
  doesn't ignore the "duplicate" alerts.  The repeat notifications would
  result in extra emails, which may be alarming to whomever is receiving
  the alerts.  "Snoozing" the PagerDuty events wouldn't prevent email
  notifications from being sent; those would persist until the alert is
  properly resolved (or downtimed/acked) via Nagios.  So, if there is a
  way we can also enable a slightly different policy between email and
  pagerduty alerts, that would also help.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nagios/+bug/1906321/+subscriptions



References