yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1579213] Re: ComputeFilter fails because compute node has not been heard from in a while

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Mathieu Gagné <1579213@xxxxxxxxxxxxxxxxxx>
Date: Tue, 10 May 2016 23:07:13 -0000
Reply-to: Bug 1579213 <1579213@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Melanie Witt explained to me on IRC that in environment with a high
number of nodes (Ironic happens to be one of them), the ComputeFilter
filter should be listed first to avoid those errors.

The scheduling process could take more than 60s (which happens to be the
default value of the service_down_time config) and when ComputeFilter is
finally invoked, the servicegroup API will think the compute service is
down and therefore reject the node.

It is unclear why forcing systematic scheduler state updates from
resource tracker improves the situation. I will assume this is caused by
some unknown side effects.

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1579213

Title:
  ComputeFilter fails because compute node has not been heard from in a
  while

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===========

  When scheduling an instance with Nova and Ironic, some hypervisors are
  ignored by ComputeFilter because they "has not been heard from in a
  while".

  Expected result
  ===============

  I expect all hypervisors to be available to nova-scheduler.

  Actual result
  =============

  Some hypervisors are ignored due to the service being "down".

  I found that:
  * ComputeFilter is ignoring hypervisors if the "nova.compute_nodes.updated_at" field is outdated according to the "service_down_time" config.
  * When starting nova-compute service, the field is updated correctly.
  * Next resource usage updates do not update the field until the service is restarted.
  * Resource tracker does not update scheduler state (and field) if no change is found for the hypervisor. [1] Commenting out those lines makes nova-compute update the updated_at field correctly and nova-scheduler is happy.

  This makes nova-scheduler sad and not all hypervisors are available
  during scheduling.

  Environment
  ===========

  Nova 2015.1.2

  [1]
  https://github.com/openstack/nova/blob/d619ad6ba15df1cf7dc92ddf84d1c65af018682f/nova/compute/resource_tracker.py#L632-L633

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1579213/+subscriptions

References

[Bug 1579213] [NEW] ComputeFilter fails because compute node has not been heard from in a while
From: Mathieu Gagné, 2016-05-06