ubuntu-docker-images team mailing list archive

Thread
Date

Re: Auto-trigger frequency for OCI tests

To: Paride Legovini <paride.legovini@xxxxxxxxxxxxx>
From: Sergio Durigan Junior <sergio.durigan@xxxxxxxxxxxxx>
Date: Tue, 15 Jun 2021 12:32:52 -0400
Cc: ubuntu-docker-images@xxxxxxxxxxxxxxxxxxx
In-reply-to: <63e4f990-1ea1-9078-1d1f-440ac2c17eb8@canonical.com> (Paride Legovini's message of "Tue, 15 Jun 2021 13:03:29 +0200")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

On Tuesday, June 15 2021, Paride Legovini wrote:

> Bryce Harrington wrote on 14/06/2021:
>> On Mon, Jun 14, 2021 at 04:16:14PM -0400, Sergio Durigan Junior wrote:
>>> Hi,
>>>
>>> Now that we have an all-green OCI test matrix, it's time to discuss the
>>> frequency with which we want to auto-trigger the tests.
>>>
>>> A few things worth considering:
>>>
>>> 1) We don't have an automatic way to know when an image is updated.  We
>>> could keep track of the digests that were previously tested and compare
>>> them with the digests currently available in a registry, and then skip
>>> the test if they are the same.  I'm sure there are implications to this
>>> that I'm not foreseeing right now.
>>>
>>> 2) We don't have unlimited resources, of course.  I *think* that running
>>> all tests once a day would be fine, but I don't know for sure.  We don't
>>> want our tests to affect other CI projects.  Also, see (4).
>>>
>>> 3) Dockerhub/AWS have rate-limiting in place even when we authenticate
>>> before running the tests.  I *think* that running everything daily
>>> should be fine, but I don't know for sure.  Also, see (4).
>>>
>>> 4) Ideally, we have to take into account the fact that we plan to expand
>>> the number of images during the upcoming cycles, which means that we
>>> will eventually have dozens of images to test.  Nowadays, an LTS image
>>> requires 8 unit test runs (one for each arch * one for each namespace) +
>>> 1 standalone run, whereas a non-LTS image requires 4 unit test runs (one
>>> for each arch) + 1 standalone run.  If we consider that the long term
>>> plan is to reach 60 images, you can see how problematic it will be.
>>>
>>>
>>> Maybe the best mid/long-term plan would be to work on tooling that would
>>> allow us to detect when an image is updated and decide whether or not we
>>> should run the tests for it.  I mean, we already have this knowledge
>>> coded into some of our scripts; it's just a matter of putting something
>>> together now.
>>>
>>> For now, given that we won't be working on these images every day, we
>>> can have the tests running once or twice a week in order to save
>>> resources.  If we see that it's not possible to schedule all of them at
>>> once (due to rate-limiting, for example), we can schedule the unit tests
>>> one day and the standalone tests the next.
>>>
>>> WDYT?
>>
>> Given that the software in the images should in theory change super
>> infrequently, then a longer interval between test auto-runs makes
>> sense.  If a given image really only changes no more than once a month,
>> then having 30 days of test runes isn't going to add a lot of value.
>> So I'd look at even maybe just once a week.
>>
>> As long as there's a manual way to trigger tests to run, e.g. at upload
>> time or after a new base image change has been released, then can just
>> include a retrigger step as part of the upload+review process.  That'd
>> be the ideal context to deal with test failures, and my guess is going
>> to detect 95% of the actual problems the oci images would hit in
>> practice.
>
> I agree it's reasonable to auto-trigger the jobs weekly for now. The
> jobs can always be triggered manually if needed.
>
> This PR add the @weekly timed trigger:
>
> https://github.com/canonical/server-jenkins-jobs/pull/196

Thanks, Paride.  PR accepted and merged.

> In general Jenkins is not meant to do actions based on the results of
> previous job runs, but with some tricks it should be feasible. (It's
> not as easy as dumping the hash in some file, as the job may execute
> on a different machine, workspaces are cleaned up, and so on.)

I see.  If it's too hard/cumbersome/unstable, then I think we can try to
come up with something else.

> Thinking about it, we could use the image tagging to make this more a
> CI system than just testing. Something on these lines:
>
> 1. Launchpad publishes the images with tag "edge".

FWIW, and this is just a small detail, but LP publishes the M.N-X.Y_edge
tag.

> 2. Jenkins tests those, perhaps daily. If all the tests pass, the
> images are tagged as "candidate". However if the same image is already
> tagged "candidate" the test is skipped.

We have to discuss whether is makes sense to tag the image as
"candidate" or "beta", but I agree.

> 3. The jobs can have a "Tag image as latest" checkbox or something
> that does Jenkins the tagging if all the test pass. It's
> semi-automatic, so we keep some control there.

+1.

> Caveats:
>
> - Security. It would be nice to have "tag-only" credentials for
> Jenkins, so should the creds leaks no malicious images can be
> uploaded. I don't know if this is feasible.

I don't know either.  I know that AWS offers some granularity when
you're going to choose the permissions attached to a credential, but I
think Dockerhub doesn't.  I know that they've announced read-only
credentials very recently.

> - Handling of multiple registries. The unit tests are currently
> running only on a single registry (aws), but tagging should happen on
> all of them, and be integrated with the "standalone" checks for image
> consistency.

If we're going to use the existing multi-arch tagging scripts, then this
shouldn't be a problem.

> I hope this makes sense, I'm not fully familiar with the publishing
> and tagging process.

TBH this is whole process is something that is still being refined, and
there are some corner cases that are not clear (for example, the
promotion of tags is something that is only theoretical for now, and
hasn't been implemented anywhere).  But what you wrote above makes sense
for me, and is actually the direction that I think we should indeed be
heading: LP creates the authoritative _edge tag, and the CI is
responsible for the promotion.

I know Athos has a few ideas he would like to discuss which might change
this a little bit, but we can always adapt the process later if needed.

So, my next question for you would be: how can we help you implementing
this idea?

Thanks,

-- 
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0  EB2F 106D A1C8 C3CB BF14

References

Auto-trigger frequency for OCI tests
From: Sergio Durigan Junior, 2021-06-14
Re: Auto-trigger frequency for OCI tests
From: Bryce Harrington, 2021-06-14
Re: Auto-trigger frequency for OCI tests
From: Paride Legovini, 2021-06-15