canonical-ubuntu-qa team mailing list archive
-
canonical-ubuntu-qa team
-
Mailing list archive
-
Message #04213
[Merge] ~andersson123/autopkgtest-cloud:fix-systemd-stop-restart-losing-jobs into autopkgtest-cloud:master
Tim Andersson has proposed merging ~andersson123/autopkgtest-cloud:fix-systemd-stop-restart-losing-jobs into autopkgtest-cloud:master.
Commit message:
fix: worker: actually save messages when service stopped by systemd
Previously, we had an iteration of this functionality which instead
utilised checking the exit code of the autopkgtest subprocess
for a -15 code, utilising the mechanism detailed in [1].
This was brittle - in the case the restart was executed at the time
before the VM for the test was in BUILD state, the autopkgtest
subprocess would exit with code 1 and the test request would be
lost.
This commit amends the issue by instead utilising the signal
handlers in the worker code.
This commit also re-introduces the documentation lost by the
revert commit prior to this one.
Fixes LP: #2067714
[1] https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.returncode
Requested reviews:
Canonical's Ubuntu QA (canonical-ubuntu-qa)
Related bugs:
Bug #2067714 in Auto Package Testing: "systemctl restart autopkgtest@*.service still looses jobs"
https://bugs.launchpad.net/auto-package-testing/+bug/2067714
For more details, see:
https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/466850
The latest change I made for this functionality was unfortunately, bunk. It didn't work as intended in the case that the test was systemctl restart'ed before the VM was in at least BUILD state.
I've tested these new changes under the following scenarios:
- Before the VM is even in BUILD state
- When the VM is in BUILD state
- When the VM is in ACTIVE state
I believe this covers all the possible states we can encounter - aside from after ACTIVE state and the VM is already killed - though this is the same as prior to the VM being spawned.
--
Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:fix-systemd-stop-restart-losing-jobs into autopkgtest-cloud:master.
diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
index 2467a8b..32c8661 100755
--- a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
+++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker
@@ -1367,33 +1367,26 @@ def request(msg):
# tests can be requested to be killed with:
# kill -15 $pid
logging.warning(
- "autopkgtest exited with code: %i",
+ "autopkgtest exited with code: %i\nack-ing message: %s",
code,
+ body.encode(),
)
running_test = False
# we ack the message so it doesn't go back in the queue
- if code == -15:
+ if exit_requested is not None:
logging.info(
- "autopkgtest has been killed with SIGTERM, indicating a restart or stop, requeue-ing message %s"
- % body
+ "An exit has been requested via systemd, placing message %s back in queue",
+ body.encode(),
)
msg.channel.basic_reject(
msg.delivery_tag, requeue=True
)
- elif code == -9:
- logging.info(
- "autopkgtest has been killed intentionally, removing message %s from the queue"
- % body
- )
- msg.channel.basic_ack(msg.delivery_tag)
else:
- logging.info(
- "autopkgtest has hit an obscure failure mode, removing message %s from the queue"
- % body
- )
- msg.channel.basic_reject(
- msg.delivery_tag, requeue=True
+ logging.warning(
+ "autopkgtest failure not requested via systemd, removing message %s from queue",
+ body.encode(),
)
+ msg.channel.basic_ack(msg.delivery_tag)
logging.info(
"Killing openstack server with uuid %s",
test_uuid,
diff --git a/docs/administration.rst b/docs/administration.rst
index 2171554..5abbbe4 100644
--- a/docs/administration.rst
+++ b/docs/administration.rst
@@ -486,8 +486,10 @@ In order to kill a currently running test, grab the test uuid. This can be seen
ps aux | grep runner | grep $uuid
# grab the PID from the process
# If you want to kill the test and remove the test request from the queue:
+ # kill -9 sends SIGKILL, which isn't caught by signal handlers in the worker code.
kill -9 $pid
# If you want to kill the test and preserve the test request in the queue:
+ # kill -15 sends SIGTERM, which is caught by the signal handlers in the worker code.
kill -15 $pid
This will kill the autopkgtest process, and then the worker will `ack` the test request