← Back to team overview

nagios-charmers team mailing list archive

[Bug 1877400] Re: Need ability to tune service checks to non-default notification profiles

 

Related bug: https://bugs.launchpad.net/charm-hw-health/+bug/1876931

-- 
You received this bug notification because you are a member of Nagios
Charm developers, which is subscribed to Nagios Charm.
https://bugs.launchpad.net/bugs/1877400

Title:
  Need ability to tune service checks to non-default notification
  profiles

Status in Nagios Charm:
  Triaged
Status in NRPE Charm:
  Triaged

Bug description:
  Currently, when using
  charmhelpers.contrib.charmsupport.nrpe.add_check(), service checks are
  defined with a max_check_attempts = 4 and retry_check_interval = 1.
  this means that when a service fault is detected, 4 checks of that
  service must have a non-OK result to turn into a HARD fault that
  requires notification through alerting (pagerduty, email, etc).

  Some checks defined in NRPE and by other charms have known ebb and
  flow of threshold crossing that results in self-resolved alerts.  One
  such example might be rabbitmq-server's unconsumed messages threshold,
  wherein we know that when a nova/neutron node restarts, there is a
  swelling of queues for up to 30 minutes of unconsumed fanout queues
  that will be reaped by nova or neutron after an amount of time has
  passed.  It would be very useful to provide different
  max_check_attempts options to charm developers and nrpe check
  developers to be able to identify which checks should alert
  immediately, and which checks should, potentially, not alert unless
  they've been active for 2 hours.

  See https://bugs.launchpad.net/charm-hw-health/+bug/1876931 for an
  example where having the ability to ignore IPMI hardware timeouts for
  a couple hours would reduce operational overhead for services known to
  have issues that self-resolve in normal circumstances and would
  continue well past the check attempt timing if there is an actual
  issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nagios/+bug/1877400/+subscriptions


References