← Back to team overview

ubuntu-phone team mailing list archive

Re: A new Image release Proposal

 

hi,
On Fr, 2013-11-29 at 11:32 +0100, Alexander Sack wrote:
> Hi,
> 
> it seems you put a few changes up for discussion in one shot.
> 
> Let's keep those separate and look at them one by one:
> 
> >From what I see you basically propose three main things:
> 
>  1. lets increase velocity of image production so we get 2-3 images
> produced in devel-proposed per day
>  2. make cron the technology we use to schedule and kick those images
> 2-3 times a day
>  3. increase manual testing done before "releasing" images create a
> broader touch-release team that will include avengers and manual
> testers and community etc.
> 
> Let me look at them one by one and then give a bullet summary of what
> I believe we should indeed tweak for now...
> 
> On 1.
> ======
> 
> I think 1. is and was the goal. So I think noone disagrees with the
> benefits of having 2-3 checkpoints a day and we should just do it.
> Note: it actually always was that way when I ran the landing team and
> during release time. I believe we still do it, but if we don't we
> should certainly ensure that we get back to do this.
on the majority of days in the past we only had one image build per day
simply because there were to many landings to wait for and in the end we
had huge change sets that burned a lot of manpower when searching where
a regression comes from.
 
> 
> On 2.
> ======
> 
> You are suggesting a technical solution to the problem "how and when
> do we cut images".
> 
> I don't see why we would go for cron if we have something that is
> smarter - e.g. our landing process. It would be a big step back to do
> that. Let's be smarter :)...
> 
> What we did during the final weeks of release and what we should
> continue to do (until we have trigger based image production) was to
> cut images based on a smart, individual landing plan that doesn't use
> a strict time approach, but rather a hybrid approach that also takes
> landing goals into account also
> 
> For instance, every morning, landing team looks at the work to do and
> decides what chunks of work we would like to have in image 1,2,3...
> then they set themselves a hard end time to avoid that we drag on
> without images forever. This worked pretty well.
> 
> On top we should ensure that we continue producing images also during
> times where landing team does not operate. That's mostly on weekend,
> but also might be during eur/US nights. For those times we can use
> cron to compensate the lack of available brains :0
> 

we should have a fixed cron schedule even if the landing team is around,
it is a huge pain if the change sets get bigger, how about we have one
or two fixed cron builds per day and still the opportunity to trigger a
third manual build at will. (the testing infrastructure is still highly
unstable and unreliable, tests need to be re-run on nearly every image
build, we have two persons doing this in two time zones and just started
to discuss a cron schedule on IRC that makes sure the images are built
at a time most convenient for them so we can have images ready during
their working hours with enough wiggle room for manually restarting the
individual tests that failed or were flaky)


> On 3.
> ======
> 
> Your proposal means very different things based on what you call
> "image release". So far we have used the word "promotion" to describe
> the act of moving a "blessed" image from a -proposed channel to a
> non-proposed channel. I am not sure if thats what you call "release"
> in your mail, but I assume so...
yes, i mean promoting images from -proposed to devel/trusty

> 
> Let's look at the channels and its purposes again:
> 
>  - devel-proposed -> here all images get spit out. they are completely
> untested and haven't even run through automation (read: why do you
> want to bother big dogfooders and avengers by telling them to test
> this stuff)
because lots of regressions go out unnoticed, these images see automated
tests in a system that isn't very reliable yet, beyond that they get a
minimal smoke test (usually done by popey and me) that only covers as
much as we invest time ...
that method is not covering any regressions that show up after a while
only or that a manual smoketest simply didn't catch.

we have a big community of people out there running the -proposed image
(I would say even more than people that actually use the devel channel),
we should give them a platform to be able to give us feedback and
participate in testing and bug triage for better regression detection.
locking them out by having team-only hangout meetings can't be the
solution to open development IMHO, lets open up to the community again
please.

>  - devel -> here we put images that have gone through automation and
> that are ready for dogfooders to pick up
>  - stable -> here is where we have end users and deliver updates to
> end users through it.
> 
> Now the consent on target frequency of those is:
> 
>  - devel-proposed == 2-3 times a day (automated testing only)
>  - devel == 1+ times a day (dogfooders and avengers testing with goal
> to drive us to next stable update)
>  - stable == 1-6 monthlty (stable users will give even more "testing")
> 
> I think that all makes sense, and doesnt' really need changing?
given that our automated tests cant even catch any GSM and SMS issues I
don't see how this all "makes sense". we have people out there using
these images, lets get their feedback, have them help and
participate ...

> 
> What needs better organization is the testing of dogfooders and
> avengers of "already blessed" devel images. Here your idea about a
> touch-release team makes sense. So far we had delegated that to jfunk.
> You could help him organize a more effective avengers effort that also
> includes the community, so maybe talk to him.

how does that help at all to prevent us from promoting images 
with regressions ?
having the avengers test the images is nice and all and will give us a
good set of high level bugs but it does not at all help with the issue
that we need to improve the promotion process ... automation can only
cover a small part here, lets involve our community in our processes
when we can ... 

getting bugs only from the avengers for slipped regressions also means
there is quite a delay sometimes so the respective package/code base has
evolved a lot already and when we try to nail down the issue we need to
do archeology. I was hoping that we could win some agility back with my
proposal of a public facing touch-release team, our current processes
are very slow and add a lot of delay everywhere while not really
improving the quality IMHO.

> 
> -----
> 
> OK, let's summarize what we got so far and let's do the following
> tweaks for now...
> 
> 
> Summary
> =========
> 
>  1. we start producing 2 images a day until end of year at a
> predicable schedule (didrocks will announce that schedule after
> discussing internally)
> 
>  2. we don't enable cron during business days. Instead we hook image
> kicks up to our landing process so that we get a smart, but predicable
> schedule
>     - for instance, the times of image build will always happen around
> the same hours (e.g. image 1: 1200-1400, image2: 1800-2000) the same
> timeframe, but also will be smart about considering the landing
> payload so we can ensure that the critical pieces really landed etc.

why does that matter at all ? 

if the landing is ready it enters the archive and will be automatically
in the next build, no matter when that build was done. the proposed
migration of the archive makes sure the set of packages will land
together (if you have packages slipping through then there is a
packaging bug that needs fixing) 
I never saw (and still don't see) why image production should be tied
into landings at all, landings are held back in the infrastructure
automatically until they are complete.

> 
>  3. to keep the image frequency acceptable at all times, we enable
> cron builds during weekend and days where landing team is not
> operational.
...
> 
>  4. ogra and team to help jfunk to organize a more vibrant avengers
> community around testing of images after devel promotion; this team
> has the goal to identify issues that would block a stable promotion
> and will be fed back into the landing team so they can prioritize
> landings with the goal to clear a new stable promotion.

you missed my most important point of involving the community by having
open and regular IRC meetings. 
testing *after* promotion ... while it is nice and generates good
bugs ... doesn't help at all with preventing regressions from slipping
into the promoted images.

note that all images we promoted since r10 had and still have
regressions (not to mention that the automated test pass rate is far
below r10 too)

so to summarize from my side:

1+2+3) lets go with a semi automatic schedule then, so we even get
images if all ubuntu-cdimage members that can trigger them are run down
by simultaneous buses on different continents on the same day ;)
(I still disagree that the landing team is the right team to drive image
stuff and I also still think it puts extra load on them that should
better be invested into processing more landings per day. our
infrastructure is designed in a way that landings are held back until
they are installable, there is no need to bind image builds to landings.
we add an artificial blocker to the process were we have a reliable and
automatic one in place since years)

4) lets still have a more open regression testing process that involves
the community more to make sure regressions do not slip into promoted
images.

ciao
	oli

Attachment: signature.asc
Description: This is a digitally signed message part


Follow ups

References