David Kranz <david.kranz@xxxxxxxxxx> writes:
I just submitted https://review.openstack.org/#/c/11755/ to skip the
tests that are failing. I filed three bug tickets too. I can only
reproduce one of them in my environment (the one where the error code
seems to have changed). BTW, I looked at the nova logs for a tempest
build that succeeded and there were a bunch of error backtraces. I am
going to file bugs about that as well with nova. I agree with the
comments of both Dan and Daryl but right now it seems to me more
dangerous to block active work. I think we need to get the tempest
gate passing asap. I don't think any of the things we are seeing are
bugs in tempest.
You seem to have the immediate problem in hand; I just have a few
high-level thoughts:
1) We designed the devstack-gate with a facility to install the SSH key
of the developer whose change failed onto the VM and give it to them for
debugging purposes -- however, we've yet to have a devstack-gate node
provider give us permission to hand out VMs like that. I'm sorry that
hasn't happened, though in the mean time, if there's any useful
information (logs, output of ps or ip commands, etc) you'd like to be
copied off of the machines that we aren't already doing, please let me
know (or submit a patch to openstack-ci/devstack-gate).
2) The unbalanced relationship tempest has with the rest of the gate is
certainly unusual. Ideally, I think all the projects should run the
same set of (fairly complete) tests. Then no project can commit a
breaking change (and impede the work of another project). Jay asked us
to set things up this way because he has higher confidence in the smoke
tests than the rest of the suite. It seems like a reasonable way to
start using tempest for gating the projects, as well as keeping the goal
of improving the wider test suite in view, perhaps at the cost of extra
work for tempest developers.
But as we get to the point where the non-smoke tests are failing due to
real problems with the core projects rather than tempest itself, we
should look at making those tests part of the wider gate (either by
making them smoke tests, or expanding the gate to run more than just the
smoke tests for all projects).
Of course, run time is a consideration, but we wrote Zuul largely to
deal with this problem -- Zuul performs gate tests in parallel (but
still tests each change individually as it will be merged). So while we
definitely would like to keep run-time as short as possible, running the
Jenkins jobs in parallel means we don't have to wait for each change to
be tested in series. So in short, do please make tempest run as fast as
possible, but we want to run useful tests, which takes time, and that's
something we're prepared to deal with.
-Jim