← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1398128] Re: ironic tempest tests periodically failing: No valid host was found

 

OK, so we did some more digging here.  Devananda caught that the hosts
ssh credentials to access local libvirt are created after the nodes are
enrolled.  Ironic can't validate power state of the nodes until it can
connect to libvirt, nova wont take into account a nodes resources until
its power state has been validated, causing a delay in schedule-able
nodes.

** Changed in: devstack
       Status: Fix Released => Confirmed

** Changed in: ironic
       Status: New => Invalid

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1398128

Title:
  ironic tempest tests periodically failing: No valid host was found

Status in devstack - openstack dev environments:
  In Progress
Status in OpenStack Bare Metal Provisioning Service (Ironic):
  Invalid
Status in OpenStack Compute (Nova):
  Invalid

Bug description:
  This was noticed on the stable/juno ironic sideways grenade jobs, but
  is also confirmed to be happening on the check-tempest-dsvm-ironic-
  parallel-nv job, which runs a similarly configured tempest run against
  Ironic:

  http://logs.openstack.org/84/137684/1/check/check-grenade-dsvm-ironic-
  sideways/6d118bc/

  A number of the early compute tests will fail to spawn an instance,
  getting a scheduling error on the client side:

  BuildErrorException: Server %(server_id)s failed to build and is in ERROR status
  Details: Server eb81ee40-ceba-484d-b665-92ec3bf4fedd failed to build and is in ERROR status
  Details: {u'message': u'No valid host was found. ', u'created': u'2014-11-27T17:44:05Z', u'code': 500}

  Looking through the nova logs, the request never even makes to the
  nova-scheduler.  The last error is reported in conductor:

  2014-11-27 17:44:01.005 WARNING nova.scheduler.driver [req-a3c046e5
  -66db-4bca-a6f8-2263763e49a6 SecurityGroupsTestJSON-2119055496
  SecurityGroupsTestJSON-1381566740] [instance: 9008811a-f400-42ae-
  98d5-caf828fa34dc] NoValidHost exception with message: 'No valid host
  was found.'

  Looking at the time stamps of the requests, the first instance is
  requested at 17:44:00

  2014-11-27 17:44:00.944 24730 DEBUG tempest.common.rest_client [req-
  a3c046e5-66db-4bca-a6f8-2263763e49a6 None] Request
  (SecurityGroupsTestJSON:test_server_security_groups): 202 POST
  http://127.0.0.1:8774/v2/adf4838f0d15462da4601a5d853eafbf/servers
  0.515s

  However, on the nova-compute side, the resource tracker has not been
  updated to include the enlisted Ironic nodes until much later.  This
  first time the tracker contains any of the ironic resources is at
  17:44:06:

  2014-11-27 17:44:06.224 21645 AUDIT nova.compute.resource_tracker [-]
  Total physical ram (MB): 512, total allocated virtual ram (MB): 0

  So there's a race between the resource tracker's initial inclusion of
  available resources and Tempest running the first set of tests that
  require an instance.   This can be worked around in a couple of ways:

  * Adjust the periodic task interval on nova-compute to update much more frequently, tho this will just narrow the window.  
  * Have tempest run an admin 'nova hypervisor-stats' call on the client side and wait for resources before running any instances (in the case of baremetal only)
  * Adjust devstack's nova cpu deployment to spin until hypervisor-stats reflect the ironic node parameters

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1398128/+subscriptions