← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1839674] Re: ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails

 

Reviewed:  https://review.opendev.org/675704
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f578146f372386e1889561cba33e95495e66ce97
Submitter: Zuul
Branch:    master

commit f578146f372386e1889561cba33e95495e66ce97
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Fri Aug 9 17:17:45 2019 -0400

    rt: only map compute node if we created it
    
    If ComputeNode.create() fails, the update_available_resource
    periodic will not try to create it again because it will be
    mapped in the compute_nodes dict and _init_compute_node will
    return early but trying to save changes to that ComputeNode
    object later will fail because there is no id on the object,
    since we failed to create it in the DB.
    
    This simply reverses the logic such that we only map the
    compute node if we successfully created it.
    
    Some existing _init_compute_node testing had to be changed
    since it relied on the order of when the ComputeNode object
    is created and put into the compute_nodes dict in order
    to pass the object along to some much lower-level PCI
    tracker code, which was arguably mocking too deep for a unit
    test. That is changed to avoid the low-level mocking and
    assertions and just assert that _setup_pci_tracker is called
    as expected.
    
    Change-Id: I9fa1d509a3de405d6246fb8670612c65c10cc93b
    Closes-Bug: #1839674


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839674

Title:
  ResourceTracker.compute_nodes won't try to create a ComputeNode a
  second time if the first create() fails

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress

Bug description:
  I found this while writing a functional recreate test for bug 1839560.

  As of this change in Ocata:

  https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537

  The ResourceTracker.compute_nodes dict will store the ComputeNode
  object *before* trying to create it:

  https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571

  The problem is if ComputeNode.create() fails for whatever reason, the
  next run through update_available_resource won't try to create the
  ComputeNode again because of this:

  https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546

  And eventually you get errors like this:

      b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.'
      b'Traceback (most recent call last):'
      b'  File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node'
      b'    startup=startup)'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource'
      b'    self._update_available_resource(context, resources, startup=startup)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner'
      b'    return f(*args, **kwargs)'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 796, in _update_available_resource'
      b'    self._update(context, cn, startup=startup)'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1052, in _update'
      b'    self.old_resources[nodename] = old_compute'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__'
      b'    self.force_reraise()'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise'
      b'    six.reraise(self.type_, self.value, self.tb)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py", line 693, in reraise'
      b'    raise value'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1046, in _update'
      b'    compute_node.save()'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper'
      b'    return fn(self, *args, **kwargs)'
      b'  File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, in save'
      b'    db_compute = db.compute_node_update(self._context, self.id, updates)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter'
      b'    self.obj_load_attr(name)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr'
      b'    _("Cannot load \'%s\' in the base class") % attrname)'
      b"NotImplementedError: Cannot load 'id' in the base class"

  We should only map the ComputeNode when we've successfully created it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839674/+subscriptions


References