← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1767139] [NEW] TypeError in _get_inventory_and_update_provider_generation

 

Public bug reported:

Description
===========

Bringing up a new cluster as part of our CI after switch from 16.1.0 to
16.1.1 on Centos, I'm seeing this error on some computes:

2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager Traceback (most recent call last):
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6752, in update_available_resource_for_node
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 704, in update_available_resource
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update_available_resource(context, resources)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(*args, **kwargs)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 728, in _update_available_resource
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._init_compute_node(context, resources)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 585, in _init_compute_node
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update(context, cn)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 886, in _update
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 64, in set_inventory_for_provider
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 789, in set_inventory_for_provider
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update_inventory(rp_uuid, inv_data)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 56, in wrapper
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(self, *a, **k)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 675, in _update_inventory
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if self._update_inventory_attempt(rp_uuid, inv_data):
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 562, in _update_inventory_attempt
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     curr = self._get_inventory_and_update_provider_generation(rp_uuid)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 546, in _get_inventory_and_update_provider_generation
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if server_gen != my_rp['generation']:
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager TypeError: 'NoneType' object has no attribute '__getitem__'

The error seems persistent for a single run of nova-compute.

Steps to reproduce
==================

Nodes were started by our CI infrastructure.  We start 3 computes and a
single control node.  In 50% of cases, one of the computes comes up in
this bad state.

Expected result
===============

Working cluster.

Actual result
=============

At least one of 3 nodes fails to join the cluster, it's not picked up by
discover_hosts and I see the above stack trace repeated in the nova-
compute logs.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

$ rpm -qa | grep nova
python-nova-16.1.1-1.el7.noarch
openstack-nova-common-16.1.1-1.el7.noarch
python2-novaclient-9.1.1-1.el7.noarch
openstack-nova-api-16.1.1-1.el7.noarch
openstack-nova-compute-16.1.1-1.el7.noarch


2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

$ rpm -qa | grep kvm
libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
qemu-kvm-common-ev-2.9.0-16.el7_4.14.1.x86_64
qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

Not sure

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

Neutron with Calico (I work on Calico, this is our CI system)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1767139

Title:
  TypeError in _get_inventory_and_update_provider_generation

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  Bringing up a new cluster as part of our CI after switch from 16.1.0
  to 16.1.1 on Centos, I'm seeing this error on some computes:

  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager Traceback (most recent call last):
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6752, in update_available_resource_for_node
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 704, in update_available_resource
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update_available_resource(context, resources)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(*args, **kwargs)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 728, in _update_available_resource
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._init_compute_node(context, resources)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 585, in _init_compute_node
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update(context, cn)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 886, in _update
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 64, in set_inventory_for_provider
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 789, in set_inventory_for_provider
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update_inventory(rp_uuid, inv_data)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 56, in wrapper
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(self, *a, **k)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 675, in _update_inventory
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if self._update_inventory_attempt(rp_uuid, inv_data):
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 562, in _update_inventory_attempt
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     curr = self._get_inventory_and_update_provider_generation(rp_uuid)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 546, in _get_inventory_and_update_provider_generation
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if server_gen != my_rp['generation']:
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager TypeError: 'NoneType' object has no attribute '__getitem__'

  The error seems persistent for a single run of nova-compute.

  Steps to reproduce
  ==================

  Nodes were started by our CI infrastructure.  We start 3 computes and
  a single control node.  In 50% of cases, one of the computes comes up
  in this bad state.

  Expected result
  ===============

  Working cluster.

  Actual result
  =============

  At least one of 3 nodes fails to join the cluster, it's not picked up
  by discover_hosts and I see the above stack trace repeated in the
  nova-compute logs.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

  $ rpm -qa | grep nova
  python-nova-16.1.1-1.el7.noarch
  openstack-nova-common-16.1.1-1.el7.noarch
  python2-novaclient-9.1.1-1.el7.noarch
  openstack-nova-api-16.1.1-1.el7.noarch
  openstack-nova-compute-16.1.1-1.el7.noarch

  
  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

  $ rpm -qa | grep kvm
  libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
  qemu-kvm-common-ev-2.9.0-16.el7_4.14.1.x86_64
  qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

  Not sure

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

  Neutron with Calico (I work on Calico, this is our CI system)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1767139/+subscriptions


Follow ups