← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into canonical-is-prometheus:master

 

Ioana Lasc has proposed merging ~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into canonical-is-prometheus:master.

Commit message:
Add alert for PPA publisher

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~ilasc/canonical-is-prometheus/+git/canonical-is-prometheus/+merge/433207

>From I can decipher in the logs this is a cron job set to run every minute unless there is a lock on /var/lock/launchpad-publisher.lock (which means there is an active run) - looking at line 7 in https://bazaar.launchpad.net/~launchpad-pqm/lp-production-crontabs/trunk/view/head:/ppa.lp.internal-lp_publish

By inspecting the lp_publish/cron.ppa.log looks like the longest run took 17 minutes, I would say if it hasn't run in 30 minutes means we probably need to alert and look at the log.

Looks like the metric is already there for any emited by `emit_script_activity_metric` on succesfull completion of any LaunchpadCronScript and tested the expression on prometheus and it looks right when compared to the outages over the last few days.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into canonical-is-prometheus:master.
diff --git a/ols/launchpad.rules b/ols/launchpad.rules
index 2db3f47..9a404ba 100644
--- a/ols/launchpad.rules
+++ b/ols/launchpad.rules
@@ -23,6 +23,17 @@ groups:
         labels:
           severity: warning
 
+      - alert: LaunchpadPPAPublisherStuck
+        expr: absent_over_time(lp_script_activity_count{env='production', name='publish-distro'}[30m])
+        for: 5m
+        annotations:
+          summary: ppa publisher has not run for 30m
+          dashboard_url: https://grafana.admin.canonical.com/d/000000044/telegraf-host?orgId=1&from=now-2h&to=now&var-juju_controller=prodstack5-prodstack5-prodstack-is&var-juju_model=All&var-service=launchpad-ppa&var-juju_unit=launchpad-ppa%2F2
+          description: Launchpad Script {{ $labels.name }} is not running as expected.
+          playbook_url: https://wiki.canonical.com/InformationInfrastructure/IS/LaunchpadScripts#LaunchpadPPAPublisherStuck
+        labels:
+          severity: warning
+
       - alert: LaunchpadFlagExpiredMembershipsStuck
         expr: absent_over_time(lp_script_activity_count{env='production',host='loganberry',name='flag-expired-memberships'}[24h])
         for: 1h

References