← Back to team overview

openstack team mailing list archive

Re: Parallel execution of Jenkins gate jobs

 

Johannes Erdfelt <johannes@xxxxxxxxxxx> writes:

> On Tue, Jun 05, 2012, James E. Blair <corvus@xxxxxxxxxxxx> wrote:
>> One important difference is that the new system does not have
>> "retrigger" buttons in Jenkins.  If the gate tests fail with a false
>> negative, you'll need to leave another "Approved" vote in Gerrit.
>
> I have to say that with how unreliable some of the jobs are (which are
> usually problems fetching packages), this change makes approvals a bit
> more annoying.

I wholeheartedly agree.  That's why Monty has been working on setting up
a pypi mirror so we can be responsible for ensuring that all of the pip
dependencies are always available to Jenkins.  You can see the result of
his work here:

  http://pypi.openstack.org/

It's built from the pip dependencies of all the projects, so it should
be exactly what we need to run tests.  And the program that generates it
here:

  https://github.com/openstack/openstack-ci-puppet/tree/master/modules/pypimirror

> For instance:
>
> https://review.openstack.org/#/c/8133/
>
> It failed with a transient failure in gate-nova-python27. No retrigger
> function anymore, so reapprove. It now passes in gate-nova-python27 but
> fails in gate-nova-python26 (which previously passed).
>
> I understand the problem is with upstream usually, but combined with the
> unreliability of upstream and the need to rerun *all* of the tests, it
> increases the amount of baby sitting required.

Indeed, that problem is due to a failure to download a pip requirement
from sourceforge.  It _should_ be cached in the mirror, but wasn't due
to a bug in the mirroring code.  Monty fixed that yesterday, so I'll go
see why that hasn't propagated to the mirror.

It also looks like the devstack hosts may not be using the mirror; I'm
going to look into that as well.

> That means more work for us core members. I'd really like to figure out
> a way to reduce the amount of unnecessary work for us.
>
> Possibly find out a way to cache packages to reduce the number of
> failures we see, provide a way to retrigger individual jobs again or
> perhaps something else.

Unfortunately, retriggering individual jobs was never technically
correct with the way our trunk gating works, and even less so now that
changes are being stacked on top of each other when testing (the state
of the repository and set of changes that should be tested can change
dramatically between test runs).  So if we did implement a retrigger
function, even it should cause all the tests to run again.

I think we should concentrate our energy on reducing the transient
failures.  I think we have a good approach here, with the local pypi
mirror as a method of caching packages.

In short, I completely agree with your concerns.  I want to make it so
that the tests are extremely reliable and there is no need to retrigger,
which will ultimately make less work for the core developers.

I'm sorry if it's rough over the next few days as we work the bugs out.
For my part, I will re-approve jobs that I see need it while I'm chasing
down problems.

-Jim


References