← Back to team overview

ubuntu-docker-images team mailing list archive

Re: OCI failures on jenkins.u.c

 

On Monday, November 22 2021, Paride Legovini wrote:

> Hi Sergio!

Hey Paride :-),

> Sergio Durigan Junior wrote on 19/11/2021:
>
>> 3) squid OCI
>>
>> This is the most serious failure.  All architectures are failing, on all
>> namespaces.  These are the error messages:
>>
>>    06:44:00 test_start_and_connect
>>    06:44:11 Waiting for container to be ready done
>>    06:44:11 ASSERT:Could not access proxy
>>    06:44:14 ASSERT:'TCP_MISS/200' not available in '213d5900011213b6fe83b14231045e9f9be66c217c21eaac35e41190d698b0ef's logs
>>    06:44:16 ASSERT:'"GET / HTTP/1.1" 200' not available in '655890c06a6af11157f77fcd82011127a353272eba8d61b9867a4622c3b7b7ef's logs
>>
>> I cannot reproduce the failures locally.\
>
> I spent more time debugging this then I'm happy to admit, given the
> fix that I'm going to propose.

Heh...  Been there, done that :-D.

> TLDR: I think the test is racey because wait_container_ready() can't
> really detect when the container is ready.
>
> We wait for "socket opened." to appear in the container logs, but
> apparently the socket is not open for real yet, or that's behind the
> socket is not ready.

That's a reasonable explanation.

> I don't know how to work around this other than adding an arbitrary
> sleep, which I think will help us is other cases, so I added it to the
> common helper function:
>
> https://github.com/canonical/server-test-scripts/pull/139
>
> This is a test build done using my branch from the PR above:
>
> https://jenkins.ubuntu.com/server/view/oci/job/oci-unit-squid/4/
>
> Now on why the race is always "won" in our devel machine and always
> "lost" in the Jenkins nodes, that remains a mistery.

Thanks for investigating this problem.  I noticed that Athos reviewed
your PR and you've already merged it, which is great.  Let's hope we see
more stability from now on.

Thanks,

-- 
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0  EB2F 106D A1C8 C3CB BF14


References