← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1839674] [NEW] ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails

 

Public bug reported:

I found this while writing a functional recreate test for bug 1839560.

As of this change in Ocata:

https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537

The ResourceTracker.compute_nodes dict will store the ComputeNode object
*before* trying to create it:

https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571

The problem is if ComputeNode.create() fails for whatever reason, the
next run through update_available_resource won't try to create the
ComputeNode again because of this:

https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546

And eventually you get errors like this:

    b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.'
    b'Traceback (most recent call last):'
    b'  File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node'
    b'    startup=startup)'
    b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource'
    b'    self._update_available_resource(context, resources, startup=startup)'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner'
    b'    return f(*args, **kwargs)'
    b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 796, in _update_available_resource'
    b'    self._update(context, cn, startup=startup)'
    b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1052, in _update'
    b'    self.old_resources[nodename] = old_compute'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__'
    b'    self.force_reraise()'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise'
    b'    six.reraise(self.type_, self.value, self.tb)'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py", line 693, in reraise'
    b'    raise value'
    b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1046, in _update'
    b'    compute_node.save()'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper'
    b'    return fn(self, *args, **kwargs)'
    b'  File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, in save'
    b'    db_compute = db.compute_node_update(self._context, self.id, updates)'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter'
    b'    self.obj_load_attr(name)'
    b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr'
    b'    _("Cannot load \'%s\' in the base class") % attrname)'
    b"NotImplementedError: Cannot load 'id' in the base class"

We should only map the ComputeNode when we've successfully created it.

** Affects: nova
     Importance: Medium
     Assignee: Matt Riedemann (mriedem)
         Status: Triaged


** Tags: resource-tracker

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839674

Title:
  ResourceTracker.compute_nodes won't try to create a ComputeNode a
  second time if the first create() fails

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  I found this while writing a functional recreate test for bug 1839560.

  As of this change in Ocata:

  https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537

  The ResourceTracker.compute_nodes dict will store the ComputeNode
  object *before* trying to create it:

  https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571

  The problem is if ComputeNode.create() fails for whatever reason, the
  next run through update_available_resource won't try to create the
  ComputeNode again because of this:

  https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546

  And eventually you get errors like this:

      b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.'
      b'Traceback (most recent call last):'
      b'  File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node'
      b'    startup=startup)'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource'
      b'    self._update_available_resource(context, resources, startup=startup)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner'
      b'    return f(*args, **kwargs)'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 796, in _update_available_resource'
      b'    self._update(context, cn, startup=startup)'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1052, in _update'
      b'    self.old_resources[nodename] = old_compute'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__'
      b'    self.force_reraise()'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise'
      b'    six.reraise(self.type_, self.value, self.tb)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py", line 693, in reraise'
      b'    raise value'
      b'  File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1046, in _update'
      b'    compute_node.save()'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper'
      b'    return fn(self, *args, **kwargs)'
      b'  File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, in save'
      b'    db_compute = db.compute_node_update(self._context, self.id, updates)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter'
      b'    self.obj_load_attr(name)'
      b'  File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr'
      b'    _("Cannot load \'%s\' in the base class") % attrname)'
      b"NotImplementedError: Cannot load 'id' in the base class"

  We should only map the ComputeNode when we've successfully created it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839674/+subscriptions


Follow ups