launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #29386
[Merge] ~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into canonical-is-prometheus:master
Ioana Lasc has proposed merging ~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into canonical-is-prometheus:master.
Commit message:
Add alert for PPA publisher
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~ilasc/canonical-is-prometheus/+git/canonical-is-prometheus/+merge/433207
>From I can decipher in the logs this is a cron job set to run every minute unless there is a lock on /var/lock/launchpad-publisher.lock (which means there is an active run) - looking at line 7 in https://bazaar.launchpad.net/~launchpad-pqm/lp-production-crontabs/trunk/view/head:/ppa.lp.internal-lp_publish
By inspecting the lp_publish/cron.ppa.log looks like the longest run took 17 minutes, I would say if it hasn't run in 30 minutes means we probably need to alert and look at the log.
Looks like the metric is already there for any emited by `emit_script_activity_metric` on succesfull completion of any LaunchpadCronScript and tested the expression on prometheus and it looks right when compared to the outages over the last few days.
--
Your team Launchpad code reviewers is requested to review the proposed merge of ~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into canonical-is-prometheus:master.
diff --git a/ols/launchpad.rules b/ols/launchpad.rules
index 2db3f47..9a404ba 100644
--- a/ols/launchpad.rules
+++ b/ols/launchpad.rules
@@ -23,6 +23,17 @@ groups:
labels:
severity: warning
+ - alert: LaunchpadPPAPublisherStuck
+ expr: absent_over_time(lp_script_activity_count{env='production', name='publish-distro'}[30m])
+ for: 5m
+ annotations:
+ summary: ppa publisher has not run for 30m
+ dashboard_url: https://grafana.admin.canonical.com/d/000000044/telegraf-host?orgId=1&from=now-2h&to=now&var-juju_controller=prodstack5-prodstack5-prodstack-is&var-juju_model=All&var-service=launchpad-ppa&var-juju_unit=launchpad-ppa%2F2
+ description: Launchpad Script {{ $labels.name }} is not running as expected.
+ playbook_url: https://wiki.canonical.com/InformationInfrastructure/IS/LaunchpadScripts#LaunchpadPPAPublisherStuck
+ labels:
+ severity: warning
+
- alert: LaunchpadFlagExpiredMembershipsStuck
expr: absent_over_time(lp_script_activity_count{env='production',host='loganberry',name='flag-expired-memberships'}[24h])
for: 1h
References