yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1767139] Re: TypeError in _get_inventory_and_update_provider_generation

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu, 03 May 2018 15:00:11 -0000
Reply-to: Bug 1767139 <1767139@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
I know what the problem is:

(9:59:34 AM) mriedem: set_inventory_for_provider -> _ensure_resource_provider -> _create_resource_provider -> safe_connect returns None because it can't talk to placement yet
(9:59:41 AM) mriedem: https://review.openstack.org/#/c/524618/2/nova/scheduler/client/report.py@516
(9:59:44 AM) mriedem: so we put None in the cache

** Changed in: nova
       Status: New => Triaged

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Changed in: nova
     Assignee: (unassigned) => Matt Riedemann (mriedem)

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova/queens
       Status: New => Triaged

** Changed in: nova/pike
   Importance: Undecided => Medium

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova/pike
       Status: New => Triaged

** Changed in: nova
   Importance: High => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1767139

Title:
  TypeError in _get_inventory_and_update_provider_generation

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged

Bug description:
  Description
  ===========

  Bringing up a new cluster as part of our CI after switch from 16.1.0
  to 16.1.1 on Centos, I'm seeing this error on some computes:

  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager Traceback (most recent call last):
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6752, in update_available_resource_for_node
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 704, in update_available_resource
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update_available_resource(context, resources)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(*args, **kwargs)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 728, in _update_available_resource
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._init_compute_node(context, resources)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 585, in _init_compute_node
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update(context, cn)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 886, in _update
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 64, in set_inventory_for_provider
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     inv_data,
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return getattr(self.instance, __name)(*args, **kwargs)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 789, in set_inventory_for_provider
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     self._update_inventory(rp_uuid, inv_data)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 56, in wrapper
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     return f(self, *a, **k)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 675, in _update_inventory
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if self._update_inventory_attempt(rp_uuid, inv_data):
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 562, in _update_inventory_attempt
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     curr = self._get_inventory_and_update_provider_generation(rp_uuid)
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/report.py", line 546, in _get_inventory_and_update_provider_generation
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager     if server_gen != my_rp['generation']:
  2018-04-26 13:36:26.580 14536 ERROR nova.compute.manager TypeError: 'NoneType' object has no attribute '__getitem__'

  The error seems persistent for a single run of nova-compute.

  Steps to reproduce
  ==================

  Nodes were started by our CI infrastructure.  We start 3 computes and
  a single control node.  In 50% of cases, one of the computes comes up
  in this bad state.

  Expected result
  ===============

  Working cluster.

  Actual result
  =============

  At least one of 3 nodes fails to join the cluster, it's not picked up
  by discover_hosts and I see the above stack trace repeated in the
  nova-compute logs.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

  $ rpm -qa | grep nova
  python-nova-16.1.1-1.el7.noarch
  openstack-nova-common-16.1.1-1.el7.noarch
  python2-novaclient-9.1.1-1.el7.noarch
  openstack-nova-api-16.1.1-1.el7.noarch
  openstack-nova-compute-16.1.1-1.el7.noarch

  
  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

  $ rpm -qa | grep kvm
  libvirt-daemon-kvm-3.2.0-14.el7_4.9.x86_64
  qemu-kvm-common-ev-2.9.0-16.el7_4.14.1.x86_64
  qemu-kvm-ev-2.9.0-16.el7_4.14.1.x86_64

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

  Not sure

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

  Neutron with Calico (I work on Calico, this is our CI system)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1767139/+subscriptions
References

[Bug 1767139] [NEW] TypeError in _get_inventory_and_update_provider_generation
From: Shaun Crampton, 2018-04-26