← Back to team overview

canonical-ubuntu-qa team mailing list archive

[Merge] ~andersson123/autopkgtest-cloud:cloud-worker-active-error-percentage into autopkgtest-cloud:master

 

Tim Andersson has proposed merging ~andersson123/autopkgtest-cloud:cloud-worker-active-error-percentage into autopkgtest-cloud:master.

Commit message:
feat: cloud: add new metric for percentage of active cloud worker units

This commit introduces new functionality to the `metrics` script in the
autopkgtest-cloud charm.

It adds a new metric, just for the cloud worker units:
`autopkgtest_unit_status_active_percentage`

This is a new metric which we will use in grafana to alert the team when
the percentage of active cloud worker units drops below 50% for a
specified period of time.

This couldn't be done with just pure grafana, due to limitations
surrounding alerting and transformations.

This has already been tested. This version of the metrics script has
been running in a tmux session for a while and the panel can be seen on
grafana, already active, with the alert already set up.

Requested reviews:
  Canonical's Ubuntu QA (canonical-ubuntu-qa)

For more details, see:
https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/469122

This commit adds the percentage of total cloud worker units which are active for each architecture. This is to set up an alert in grafana, which is already done.
-- 
Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:cloud-worker-active-error-percentage into autopkgtest-cloud:master.
diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/metrics b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/metrics
index cfbf6c4..8f2a80e 100755
--- a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/metrics
+++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/metrics
@@ -23,7 +23,7 @@ logger = logging.getLogger(__name__)
 logging.basicConfig(level=logging.INFO)
 
 
-def make_submission(counts, measurement):
+def make_submission(counts, measurement, send_percentage=False):
     out = []
     for arch in counts:
         (active, error) = counts[arch]
@@ -47,6 +47,17 @@ def make_submission(counts, measurement):
             },
         }
         out.append(m)
+        if send_percentage:
+            active_percentage = int(100 - ((error / active) * 100))
+            m = {
+                "measurement": f"{measurement}_active_percentage",
+                "fields": {"percentage": active_percentage},
+                "tags": {
+                    "arch": arch,
+                    "instance": INFLUXDB_CONTEXT,
+                },
+            }
+            out.append(m)
     logger.debug("submission: %s", out)
     return out
 
@@ -104,7 +115,7 @@ def get_units():
             error += 1
         counts[arch] = (active, error)
 
-    return make_submission(counts, "autopkgtest_unit_status")
+    return make_submission(counts, "autopkgtest_unit_status", True)
 
 
 def get_list_of_intended_remote_ips(arch):

Follow ups