yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72727
[Bug 1767139] Re: TypeError in _get_inventory_and_update_provider_generation
Reviewed: https://review.openstack.org/566096
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=80a001989351d3d427c204c8c06cfacc964f2a35
Submitter: Zuul
Branch: master
commit 80a001989351d3d427c204c8c06cfacc964f2a35
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu May 3 11:21:47 2018 -0400
Handle @safe_connect returns None side effect in _ensure_resource_provider
Change I0c4ca6a81f213277fe7219cb905a805712f81e36 added more error
handling to the _ensure_resource_provider flow but didn't account
for @safe_connect returning None when calling _create_resource_provider
in the case that nova-compute is started before placement is running.
If that happens, we fail with a TypeError during the nova-compute
startup because we put None in the resource provider cache and then
later blindly try to use it because the compute node resource provider
uuid is in the cache, but mapped to None.
This adds the None check back in _ensure_resource_provider and if
None is returned from _create_resource_provider we raise the same
exception that _create_resource_provider would raise if it couldn't
create the provider.
Change-Id: If9e1581db9c1ae14340b787d03c815d243d5a50c
Closes-Bug: #1767139
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1767139
Title:
TypeError in _get_inventory_and_update_provider_generation
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
In Progress
Status in OpenStack Compute (nova) queens series:
In Progress
Bug description:
Description
===========
Bringing up a new cluster as part of our CI after switch from 16.1.0
to 16.1.1 on Centos, I'm seeing this error on some computes:
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager Traceback (most recent call last):
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6752, in update_available_resource_for_node
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager rt.update_available_resource(context, nodename)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 704, in update_available_resource
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._update_available_resource(context, resources)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager return f(*args, **kwargs)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 728, in _update_available_resource
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._init_compute_node(context, resources)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 585, in _init_compute_node
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._update(context, cn)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 886, in _update
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager inv_data,
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 64, in set_inventory_for_provider
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager inv_data,
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager return getattr(self.instance, __name)(*args, **kwargs)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 789, in set_inventory_for_provider
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager self._update_inventory(rp_uuid, inv_data)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 56, in wrapper
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager return f(self, *a, **k)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 675, in _update_inventory
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager if self._update_inventory_attempt(rp_uuid, inv_data):
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 562, in _update_inventory_attempt
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager curr = self._get_inventory_and_update_provider_generation(rp_uuid)
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 546, in _get_inventory_and_update_provider_generation
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager if server_gen != my_rp['generation']:
2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager TypeError: 'NoneType' object has no attribute '__getitem__'
The error seems persistent for a single run of nova-compute.
Steps to reproduce
==================
Nodes were started by our CI infrastructure. We start 3 computes and
a single control node. In 50% of cases, one of the computes comes up
in this bad state.
Expected result
===============
Working cluster.
Actual result
=============
At least one of 3 nodes fails to join the cluster, it's not picked up
by discover_hosts and I see the above stack trace repeated in the
nova-compute logs.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
$ rpm -qa | grep nova
python-nova-16.1.1-1.el7.noarch
openstack-nova-common-16.1.1-1.el7.noarch
python2-novaclient-9.1.1-1.el7.noarch
openstack-nova-api-16.1.1-1.el7.noarch
openstack-nova-compute-16.1.1-1.el7.noarch
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
$ rpm -qa | grep kvm
libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
qemu-kvm-common-ev-2.9.0-16.el7_4.14.1.x86_64
qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
Not sure
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
Neutron with Calico (I work on Calico, this is our CI system)
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1767139/+subscriptions
References