← Back to team overview

nagios-charmers team mailing list archive

[Bug 1902142] Re: Nagios check for unreachable pagerduty

 

Perhaps the nagios charm could implement a form of "dead man switch" in
a cron job or something ? I looked things up online but found nothing,
perhaps the following could work.

You'd set up pagerduty with a new service where the first level of
oncall is "off" and the next levels are the normal rotation. Alerts
escalate after X min (say, 15).

A cron job, every fifteen minutes, looks for an alert named "NAGIOS
PAGERDUTY E2E CHECK" or something, and if it's not there, it creates
this alert.

Another cron job, every 5 minutes, acks the alert.

Then, if pagerduty isn't reachable from the nagios unit, the E2E check
alert will escalate and page the first person in the rotation.

This may be doable with rulesets
(https://support.pagerduty.com/docs/rulesets) but I haven't immediately
found a way to do so.

-- 
You received this bug notification because you are a member of Nagios
Charm developers, which is subscribed to Nagios Charm.
https://bugs.launchpad.net/bugs/1902142

Title:
  Nagios check for unreachable pagerduty

Status in Nagios Charm:
  New

Bug description:
  if enable_pagerduty=True, but nagios cannot reach pagerduty, there
  should be a new CRITICAL alert that pagerduty isn't reachable.

  Be sure to attempt to reach pagerduty through whatever proxies
  nagios+pagerduty services are configured with.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nagios/+bug/1902142/+subscriptions


References