← Back to team overview

ubuntu-docker-images team mailing list archive

Re: OCI failures on jenkins.u.c

 

Hi Sergio!

Sergio Durigan Junior wrote on 19/11/2021:

3) squid OCI

This is the most serious failure.  All architectures are failing, on all
namespaces.  These are the error messages:

   06:44:00 test_start_and_connect
   06:44:11 Waiting for container to be ready done
   06:44:11 ASSERT:Could not access proxy
   06:44:14 ASSERT:'TCP_MISS/200' not available in '213d5900011213b6fe83b14231045e9f9be66c217c21eaac35e41190d698b0ef's logs
   06:44:16 ASSERT:'"GET / HTTP/1.1" 200' not available in '655890c06a6af11157f77fcd82011127a353272eba8d61b9867a4622c3b7b7ef's logs

I cannot reproduce the failures locally.\

I spent more time debugging this then I'm happy to admit, given the fix that I'm going to propose.

TLDR: I think the test is racey because wait_container_ready() can't really detect when the container is ready.

We wait for "socket opened." to appear in the container logs, but apparently the socket is not open for real yet, or that's behind the socket is not ready.

I don't know how to work around this other than adding an arbitrary sleep, which I think will help us is other cases, so I added it to the common helper function:

https://github.com/canonical/server-test-scripts/pull/139

This is a test build done using my branch from the PR above:

https://jenkins.ubuntu.com/server/view/oci/job/oci-unit-squid/4/

Now on why the race is always "won" in our devel machine and always "lost" in the Jenkins nodes, that remains a mistery.

Paride


Follow ups

References