← Back to team overview

launchpad-dev team mailing list archive

Re: BuildEngineer and ReleaseManagerRotation

 

On Fri, Sep 18, 2009 at 11:06:48AM +1200, Michael Hudson wrote:
> Michael Hudson wrote:
> > Christian Robottom Reis wrote:
> >> I finally got around to reviewing
> >>
> >>     https://dev.launchpad.net/PolicyAndProcess/ReleaseManagerRotation
> >>
> >> and
> >>
> >>     https://dev.launchpad.net/BuildEngineer
> >>
> >> They both look good (after some minor edits by me <wink>). I just wanted
> >> to ask two things:
> >>
> >>     a) Why does one page end in Rotation and fall under
> >>        PolicyAndProcess, and the other not?
> > 
> > Dunno, it's a wiki, so it would be unnatural if it wasn't random and
> > inconsistent?
> > 
> >>     b) How is the BuildEngineer rotation going this cycle?
> > 
> > I think it's going pretty well.  The role is quite stressful in some
> > ways, with lots of things that take far longer than you'd think and
> > fighting with systems that no-one really understands any more.
> 
> Oh, I guess I should say that I find this part of the role:
> 
>  * Monitor the buildbot and ensure smooth operation of builds.
>    o This means monitoring the builders and making sure that somebody is
>      assigned to fixing any build failures or errors as they arise.
> 
> to be perhaps not be the greatest idea, mostly because build failures
> need to be fixed *now*, not when the build engineer wakes up,

When reading the point above, I don't think it says that the build
engineer should fix build failures. It's more like monitoring the build
failures, and make sure that all issues we run into are picked up by
someone.

For example, what's up with the error about the slaves running out of
disk space? Has it been reported? Is someone looking into it?



> and also
> because I think part of the point of being the BE is that it allows you
> to get away from being distracted by the 1001 things we all have
> pressing at us constantly.

This might be true. However, I do think that build errors that aren't
caused by real test failures (like the example above) should get really
high priority to get fixed.


> 
> Systemic issues that affect the reliable functioning of buildbot should
> of course be very high priority build engineer issues -- but that's not
> the same thing.

... right, you agree with my, I guess :) So we need to define what
the responsibilites really are. I would say it's the build engineer's
responsibility to go through the build failures, and pick up any failure
that isn't a real test failure (e.g. intermittently failing tests don't
fall into this category). He doesn't necessarily have to fix it himself,
but he should make sure that a bug is filed about it, and that someone
is looking into it.


-- 
Björn Tillenius | https://launchpad.net/~bjornt



Follow ups

References