ubuntu-docker-images team mailing list archive

Thread
Date

OCI failures on jenkins.u.c

To: ubuntu-docker-images@xxxxxxxxxxxxxxxxxxx
From: Sergio Durigan Junior <sergio.durigan@xxxxxxxxxxxxx>
Date: Thu, 18 Nov 2021 21:25:42 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hi Paride,

I spent some time today analyzing and trying to understand the failures
listed in our Jenkins instance.

1) MySQL OCI

There's one failure that happened with the MySQL OCI image on ppc64el
when the image from the lts namespace was tested.

23:12:01 docker: Error response from daemon: Conflict. The container
name "/mysql_test_106973" is already in use by container
"bedd941f649952e625d87ead2903960d34017dad01be0ec67bd40451ce298a4e". You
have to remove (or rename) that container to be able to reuse that name.

This seems to be a race in our testing code, but I could not understand
how it can manifest. We are synchronously waiting until the container
is stopped/deleted before we proceed with creating another one, and it
doesn't seem like the 10-second timeout has been reached (which could
explain the problem). I took the liberty to retrigger the last test;
let's see if it succeeds. (NOTE after I wrote the whole email: the
retrigger succeeded)

2) memcached OCI

The memcached OCI image is failing on s390x when testing the image from
the ubuntu namespace. I retriggered the test, and it's still failing.
The problem is:

20:46:08 Creating all-defaults memcached container
20:46:11 Couldn't connect to 127.0.0.1:22122

I couldn't reproduce it locally. This may be a problem with the image
itself; I will try to run the tests on an s390x machine tomorrow to see
if I can reproduce.

3) squid OCI

This is the most serious failure. All architectures are failing, on all
namespaces. These are the error messages:

06:44:00 test_start_and_connect
06:44:11 Waiting for container to be ready done
06:44:11 ASSERT:Could not access proxy
06:44:14 ASSERT:'TCP_MISS/200' not available in '213d5900011213b6fe83b14231045e9f9be66c217c21eaac35e41190d698b0ef's logs
06:44:16 ASSERT:'"GET / HTTP/1.1" 200' not available in '655890c06a6af11157f77fcd82011127a353272eba8d61b9867a4622c3b7b7ef's logs

I cannot reproduce the failures locally. At least we know that the
container is up (because of the second message), but I think we may have
to tweak http_proxy et al a little bit more in order to access the local
squid. Maybe it's getting confused because of https_proxy/no_proxy
being set also...

I have a setup here that mimics Launchpad's squid.internal proxy; I
tried reproducing the problem there, to no avail. I also tried
connecting to the VPN, exporting https_proxy/no_proxy accordingly inside
a VM, and running the test. Everything passed.

There must be something specific to the Jenkins nodes that I'm not aware
of. It'd be great to run these tests inside the nodes; I will talk to
you tomorrow about this.

Thanks,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Attachment: signature.asc
Description: PGP signature

Follow ups

Re: OCI failures on jenkins.u.c
From: Paride Legovini, 2021-11-22