← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1742827] [NEW] nova-scheduler reports dead compute nodes but nova-compute is enabled and up

 

Public bug reported:

(originally reported by David Manchado in
https://bugzilla.redhat.com/show_bug.cgi?id=1533196 )

Description of problem:
We are seeing that nova scheduler is removing compute nodes because it considers them as dead but openstack compute service list reports nova-compute to be up an running.
We can see in nova-scheduler entries with the following pattern:
- Removing dead compute node XXX from scheduler
- Filter ComputeFilter returned 0 hosts
- Filtering removed all hosts for the request with instance ID '11feeba9-f46c-416d-a97e-7c0c9d565b5a'. Filter results: ['AggregateInstanceExtraSpecsFilter: (start: 19, end: 2)', 'AggregateCoreFilter: (start: 2, end: 2)', 'AggregateDiskFilter: (start: 2, end: 2)', 'AggregateRamFilter: (start: 2, end: 2)', 'RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'ComputeFilter: (start: 2, end: 0)']

Version-Release number of selected component (if applicable):
Ocata

How reproducible:
N/A

Actual results:
Instances are not being spawned reporting 'no valid host found' because of 

Additional info:
This has been happening for a week.
We did an upgrade from Newton three weeks ago.
We have also done a minor update and the issue still persists.

Nova related RPMs
openstack-nova-scheduler-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
python2-novaclient-7.1.2-1.el7.noarch
openstack-nova-novncproxy-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
openstack-nova-cert-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
openstack-nova-console-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
openstack-nova-conductor-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
openstack-nova-common-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
openstack-nova-compute-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
openstack-nova-placement-api-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
puppet-nova-10.4.2-0.20180102233330.f4bc1f0.el7.centos.noarch
openstack-nova-api-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
python-nova-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1742827

Title:
  nova-scheduler reports dead compute nodes but nova-compute is enabled
  and up

Status in OpenStack Compute (nova):
  New

Bug description:
  (originally reported by David Manchado in
  https://bugzilla.redhat.com/show_bug.cgi?id=1533196 )

  Description of problem:
  We are seeing that nova scheduler is removing compute nodes because it considers them as dead but openstack compute service list reports nova-compute to be up an running.
  We can see in nova-scheduler entries with the following pattern:
  - Removing dead compute node XXX from scheduler
  - Filter ComputeFilter returned 0 hosts
  - Filtering removed all hosts for the request with instance ID '11feeba9-f46c-416d-a97e-7c0c9d565b5a'. Filter results: ['AggregateInstanceExtraSpecsFilter: (start: 19, end: 2)', 'AggregateCoreFilter: (start: 2, end: 2)', 'AggregateDiskFilter: (start: 2, end: 2)', 'AggregateRamFilter: (start: 2, end: 2)', 'RetryFilter: (start: 2, end: 2)', 'AvailabilityZoneFilter: (start: 2, end: 2)', 'ComputeFilter: (start: 2, end: 0)']

  Version-Release number of selected component (if applicable):
  Ocata

  How reproducible:
  N/A

  Actual results:
  Instances are not being spawned reporting 'no valid host found' because of 

  Additional info:
  This has been happening for a week.
  We did an upgrade from Newton three weeks ago.
  We have also done a minor update and the issue still persists.

  Nova related RPMs
  openstack-nova-scheduler-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  python2-novaclient-7.1.2-1.el7.noarch
  openstack-nova-novncproxy-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  openstack-nova-cert-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  openstack-nova-console-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  openstack-nova-conductor-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  openstack-nova-common-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  openstack-nova-compute-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  openstack-nova-placement-api-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  puppet-nova-10.4.2-0.20180102233330.f4bc1f0.el7.centos.noarch
  openstack-nova-api-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch
  python-nova-15.1.1-0.20180103153502.ff2231f.el7.centos.noarch

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1742827/+subscriptions


Follow ups