On Wed, Apr 14, 2021 at 08:22:37PM -0400, Sergio Durigan Junior wrote:
Hi,
Today on MM Bryce and I discussed the possibility of having a new script
to help us build/rebuild/mass-rebuild OCI images. The need for this is
becoming more and more apparent with every mass-rebuild we have to do,
and with the prospect of having more images and series supported. The
Launchpad "OCI recipe" page, while informative, requires a lot of clicks
in order to get the job done.
So anyway, Bryce asked me to come up with a list of requirements and
things that I want this script to do. Here's what comes to mind.
First, a fictitious usage for the script:
image-builder.py -- Nice text here
Usage:
image-builder.py [-h] [--series MM.YY] [--arch ARCH] [--wait] [--auto-retry-uploads] [-- IMAGE_1 IMAGE_2 ... IMAGE_N ]
Where:
--series MM.YY Build only images from the MM.YY series.
Optional, can be passed multiple times.
If not provided, all series will be built.
Supporte series: 20.04, 21.04
--arch ARCH Build images only for architecture ARCH.
Optional, can be passed multiple times.
If not provided, all architectures will be built.
Supported architectures: amd64, arm64, ppc64el,
s390x
--wait Wait until all builds finished, and print
their statuses. This can take a long time.
--auto-retry-uploads Auto-retry any failed uploads to the
registries.
Optional. Implies "--wait".
-- IMAGE_N... Image(s) to build.
Optional.
If not provided, all images will be built.
A few comments:
- I don't know how feasible it is to implement the auto-retry-uploads
option; not sure whether Launchpad offers such granularity in their
API. I also don't know if it makes sense to embed it into this
script, or create a separate script just for that (which may make more
sense). However, given the number and frequency of failed uploads we
are having, and especially considering the fact that they are
currently not being reported anywhere in the LP recipe page (one has
to go to the specific build page in order to check the upload status;
check LP#1918908), this IMO is a must-have.
Yes, lp oopses and failures are an unfortunately common occurrence, and
so a lot of lp scripts include some sort of retry functionality. A
common pattern is a 3x retry with progressive delay standoff
(i.e. immediate -> 1 min -> 5 min). The reason for the immediate retry
is that lp can fail if an request is pulling uncached information and
times out; the 2nd call will benefit from the pre-filled cache and
succeed. The other wait periods are useful if the problem is just
network glitches or service loads. If it still fails after 3 tries over
5 minutes, then something bigger may be at issue such as a legit bug or
a service outage.
- I considered whether to add the "--wait" option or not. I decided to
do it, but I understand that it might be a bit tricky to implement.
It might be analogous (and hopefully simpler) than the ppa wait case we
did before. In any case, sounds useful.
- I also thought about having a "--retag" option that would invoke the
tag-images.sh script automatically after everything is done, but I'm
not entirely sure this is something fit for this script. I like
separating things into logical blocks, so perhaps after this script is
done we can have a "build-and-retag.{sh,py}" script.
With ppa-dev-tools and other launchpad object scripts, the tool
interfaces generally group into four parts:
a) creation/initialization
b) write operations - requesting builds, modifying params, etc.
c) read-only operations - checking status, parsing build logs, etc.
d) destruction
The operations you've spec'd so far fit into part (b). I'm guessing (a)
and (d) are going to be out of scope or at least low priority in our
case but perhaps we may eventually want this tool to help with creating
recipes. You might think about (c) though - when we run into build
issues, are there scriptable things we could do to help make debugging
easier?
Bryce, if you want to talk more about this tomorrow (possibly involving
Athos, since he will be part of the OCI effort very soon), we can then
come up with a nice way to split the work.
Why don't we plan on meeting up on Friday, that'll give me time on
Thursday to pull up some of the code I was showing you, so we're not
starting entirely from scratch.