← Back to team overview

openstack team mailing list archive

Re: Intermittent devstack-gate failures

 

In recent devstack builds, I've seen g-api fail intermittently, too, and couldn't figure out why it failed (nothing in logs).

This might have been the reason.

-jay

On 06/14/2012 05:48 PM, Dan Prince wrote:
Hey Jim,

Any updates or new ideas on the cause of the intermittent hangs?

I mentioned these on IRC with regards to one of the Essex failures.... one thing I've seen with Glance recently is that both the glance-api and glance-registry now use (and try to auto create) the database by default. I've had issues when I start glance-api and glance-registry in a quick sequence because both of them try to init the DB.

So in devstack we could run 'glance-manage db_sync' manually:

   https://review.openstack.org/#/c/8495/

And in Glance we could then default auto_db_create to False:

   https://review.openstack.org/#/c/8496/

Any chance this is the cause of the intermittent failures?

Dan

----- Original Message -----
From: "James E. Blair"<corvus@xxxxxxxxxxxx>
To: "OpenStack Mailing List"<openstack@xxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, June 12, 2012 7:25:01 PM
Subject: [Openstack] Intermittent devstack-gate failures

Hi,

It looks like there are intermittent, but frequent, failures in the
devstack-gate.  This suggests a non-deterministic bug has crept into
some piece of OpenStack software.

In this kind of situation, certainly could keep re-approving changes
in
the hope that they will pass the test and merge, but it would be
better
to fix the underlying problem.  Simply re-approving is mostly just
going
to make the queue longer.

Note that the output from Jenkins has changed recently.  I've seen
some
people misconstrue some normal parts of the test process as errors.
  In
particular, this message from Jenkins is not an error:

   Looks like the node went offline during the build. Check the slave
   log
   for the details.

That's a normal part of the way the devstack-gate tests run, where we
add a machine to Jenkins as a slave, run the tests, and remove it
from
the list of slaves before it's done.  This is to accommodate the
one-shot nature of devstack based testing.  It doesn't interfere with
the results.

To find out why a test failed, you should scroll up a bit to the
devstack exercise output, which normally looks like this:

*********************************************************************
SUCCESS: End DevStack Exercise: ./exercises/volumes.sh
*********************************************************************
=====================================================================
SKIP boot_from_volume
SKIP client-env
SKIP quantum
SKIP swift
PASS aggregates
PASS bundle
PASS client-args
PASS euca
PASS floating_ips
PASS sec_groups
PASS volumes
=====================================================================

Everything after that point is test running boilerplate.  I'll add
some
echo statements to that effect in the future.

Finally, it may be a little difficult to pinpoint when this started.
  A
number of devstack-gate tests have passed recently without actually
running any tests, due to an issue with one of our OpenStack based
node
providers.  We are eating our own dogfood.

-Jim

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



References