launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #28356
[Merge] ~cjwatson/launchpad:librarian-layer-retry into launchpad:master
Colin Watson has proposed merging ~cjwatson/launchpad:librarian-layer-retry into launchpad:master.
Commit message:
Retry LibrarianLayer._check_and_reset a few times
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/419974
We relatively often see mysterious `process-returncode` errors in buildbot runs that don't seem to correspond to a failed test on the same worker. On closer inspection of the subunit stream, these seem to be due to `LibrarianLayer._check_and_reset` getting `ECONNRESET` when trying to check whether the librarian is still up. I'm not sure exactly why this might be happening, but it seems reasonable to retry the are-you-still-there request a few times on general principles to see if that makes things more resilient.
--
Your team Launchpad code reviewers is requested to review the proposed merge of ~cjwatson/launchpad:librarian-layer-retry into launchpad:master.
diff --git a/lib/lp/testing/layers.py b/lib/lp/testing/layers.py
index 1b6fcb0..94c5dcd 100644
--- a/lib/lp/testing/layers.py
+++ b/lib/lp/testing/layers.py
@@ -71,6 +71,8 @@ from fixtures import (
MonkeyPatch,
)
import psycopg2
+from requests import Session
+from requests.adapters import HTTPAdapter
from six.moves.urllib.error import (
HTTPError,
URLError,
@@ -822,8 +824,11 @@ class LibrarianLayer(DatabaseLayer):
def _check_and_reset(cls):
"""Raise an exception if the Librarian has been killed, else reset."""
try:
- f = urlopen(config.librarian.download_url)
- f.read()
+ session = Session()
+ session.mount(
+ config.librarian.download_url,
+ HTTPAdapter(max_retries=3))
+ session.get(config.librarian.download_url).content
except Exception as e:
raise LayerIsolationError(
"Librarian has been killed or has hung."