nagios-charmers team mailing list archive
-
nagios-charmers team
-
Mailing list archive
-
Message #00926
[Bug 1877400] Re: Need ability to tune service checks to non-default notification profiles
Related bug: https://bugs.launchpad.net/charm-hw-health/+bug/1876931
--
You received this bug notification because you are a member of Nagios
Charm developers, which is subscribed to Nagios Charm.
https://bugs.launchpad.net/bugs/1877400
Title:
Need ability to tune service checks to non-default notification
profiles
Status in Nagios Charm:
Triaged
Status in NRPE Charm:
Triaged
Bug description:
Currently, when using
charmhelpers.contrib.charmsupport.nrpe.add_check(), service checks are
defined with a max_check_attempts = 4 and retry_check_interval = 1.
this means that when a service fault is detected, 4 checks of that
service must have a non-OK result to turn into a HARD fault that
requires notification through alerting (pagerduty, email, etc).
Some checks defined in NRPE and by other charms have known ebb and
flow of threshold crossing that results in self-resolved alerts. One
such example might be rabbitmq-server's unconsumed messages threshold,
wherein we know that when a nova/neutron node restarts, there is a
swelling of queues for up to 30 minutes of unconsumed fanout queues
that will be reaped by nova or neutron after an amount of time has
passed. It would be very useful to provide different
max_check_attempts options to charm developers and nrpe check
developers to be able to identify which checks should alert
immediately, and which checks should, potentially, not alert unless
they've been active for 2 hours.
See https://bugs.launchpad.net/charm-hw-health/+bug/1876931 for an
example where having the ability to ignore IPMI hardware timeouts for
a couple hours would reduce operational overhead for services known to
have issues that self-resolve in normal circumstances and would
continue well past the check attempt timing if there is an actual
issue.
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nagios/+bug/1877400/+subscriptions
References