canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #05799
[Merge] ~andersson123/autopkgtest-cloud:missing-tests into autopkgtest-cloud:master
Tim Andersson has proposed merging ~andersson123/autopkgtest-cloud:missing-tests into autopkgtest-cloud:master.
Requested reviews:
Canonical's Ubuntu QA (canonical-ubuntu-qa)
For more details, see:
https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/477440
Hopefully a fix for losing all these tests recently :/
--
Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:missing-tests into autopkgtest-cloud:master.
diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
index c281b0f..9483f8a 100755
--- a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
+++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
@@ -1401,16 +1401,28 @@ def request(msg):
msg.channel.basic_reject(
msg.delivery_tag, requeue=True
)
+ kill_openstack_server(test_uuid)
+ # return here so the worker can go back to listening for
+ # test requests
+ return
else:
+ # Tim Andersson:
+ # We've recently (as of 28/11/2024) been losing test requests.
+ # This block was the cause - autopkgtest would exit with an abnormal exit code,
+ # and this block would assume that an admin had intentionally killed the test,
+ # causing the message to be removed from the queue. Prior to this logic that I mention here,
+ # the test request would go back in the queue, causing the test to loop forever. The best option
+ # here I believe is to count the failure as a "real" failure - then a.u.c admins can much more easily
+ # investigate the issue, as the result will go into the database, and the log will be available
+ # in the swift storage.
+ # Setting retry to 3 causes this whole convoluted block to not execute again.
logging.warning(
- "autopkgtest failure not requested via systemd, removing message %s from queue",
+ "autopkgtest has failed with an unknown code (%i), removing message %s from queue and counting as a real failure so admins can more easily debug the issue.",
+ code,
body.encode(),
)
msg.channel.basic_ack(msg.delivery_tag)
- kill_openstack_server(test_uuid)
- # return here so the worker can go back to listening for
- # test requests
- return
+ retry = 3
else:
if num_failures >= 3:
logging.warning(