launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #26591
[Merge] ~cjwatson/lpbuildbot-worker:container-startup-retry into lpbuildbot-worker:main
Colin Watson has proposed merging ~cjwatson/lpbuildbot-worker:container-startup-retry into lpbuildbot-worker:main.
Commit message:
lp-setup-lxd-test: Add some crude backoff-and-retry logic
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~cjwatson/lpbuildbot-worker/+git/lpbuildbot-worker/+merge/399358
We sometimes see odd failures when starting lots of containers at once. Try some simple backoff-and-retry logic; it may slow things down, but is likely to be less annoying than breaking the whole test run.
--
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/lpbuildbot-worker:container-startup-retry into lpbuildbot-worker:main.
diff --git a/lp-setup-lxd-test b/lp-setup-lxd-test
index 29d5374..360969a 100755
--- a/lp-setup-lxd-test
+++ b/lp-setup-lxd-test
@@ -9,8 +9,10 @@ import shlex
import string
import subprocess
import sys
+import time
from pylxd import Client
+from pylxd.exceptions import LXDAPIException
def _exec(
@@ -59,9 +61,22 @@ def create_ephemeral_container(
}
test_container = client.containers.create(instance_config, wait=True)
- test_container.start(wait=True)
+ try:
+ test_container.start(wait=True)
+ except LXDAPIException:
+ if test_container.status == "Stopped":
+ # Back off and try again in a short while.
+ time.sleep(random.randint(30, 60))
+ test_container.start(wait=True)
+ else:
+ raise
print("Waiting for successful cloud-init")
- _exec(test_container, ["cloud-init", "status", "--wait"])
+ try:
+ _exec(test_container, ["cloud-init", "status", "--wait"])
+ except subprocess.CalledProcessError:
+ # Back off and try again in a short while.
+ time.sleep(random.randint(30, 60))
+ _exec(test_container, ["cloud-init", "status", "--wait"])
command = [
"./utilities/run-as",