← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~cjwatson/lpbuildbot-worker:container-startup-retry into lpbuildbot-worker:main

 

Colin Watson has proposed merging ~cjwatson/lpbuildbot-worker:container-startup-retry into lpbuildbot-worker:main.

Commit message:
lp-setup-lxd-test: Add some crude backoff-and-retry logic

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~cjwatson/lpbuildbot-worker/+git/lpbuildbot-worker/+merge/399358

We sometimes see odd failures when starting lots of containers at once.  Try some simple backoff-and-retry logic; it may slow things down, but is likely to be less annoying than breaking the whole test run.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/lpbuildbot-worker:container-startup-retry into lpbuildbot-worker:main.
diff --git a/lp-setup-lxd-test b/lp-setup-lxd-test
index 29d5374..360969a 100755
--- a/lp-setup-lxd-test
+++ b/lp-setup-lxd-test
@@ -9,8 +9,10 @@ import shlex
 import string
 import subprocess
 import sys
+import time
 
 from pylxd import Client
+from pylxd.exceptions import LXDAPIException
 
 
 def _exec(
@@ -59,9 +61,22 @@ def create_ephemeral_container(
     }
 
     test_container = client.containers.create(instance_config, wait=True)
-    test_container.start(wait=True)
+    try:
+        test_container.start(wait=True)
+    except LXDAPIException:
+        if test_container.status == "Stopped":
+            # Back off and try again in a short while.
+            time.sleep(random.randint(30, 60))
+            test_container.start(wait=True)
+        else:
+            raise
     print("Waiting for successful cloud-init")
-    _exec(test_container, ["cloud-init", "status", "--wait"])
+    try:
+        _exec(test_container, ["cloud-init", "status", "--wait"])
+    except subprocess.CalledProcessError:
+        # Back off and try again in a short while.
+        time.sleep(random.randint(30, 60))
+        _exec(test_container, ["cloud-init", "status", "--wait"])
 
     command = [
         "./utilities/run-as",