yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79615
[Bug 1839674] Re: ResourceTracker.compute_nodes won't try to create a ComputeNode a second time if the first create() fails
Reviewed: https://review.opendev.org/675704
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f578146f372386e1889561cba33e95495e66ce97
Submitter: Zuul
Branch: master
commit f578146f372386e1889561cba33e95495e66ce97
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Fri Aug 9 17:17:45 2019 -0400
rt: only map compute node if we created it
If ComputeNode.create() fails, the update_available_resource
periodic will not try to create it again because it will be
mapped in the compute_nodes dict and _init_compute_node will
return early but trying to save changes to that ComputeNode
object later will fail because there is no id on the object,
since we failed to create it in the DB.
This simply reverses the logic such that we only map the
compute node if we successfully created it.
Some existing _init_compute_node testing had to be changed
since it relied on the order of when the ComputeNode object
is created and put into the compute_nodes dict in order
to pass the object along to some much lower-level PCI
tracker code, which was arguably mocking too deep for a unit
test. That is changed to avoid the low-level mocking and
assertions and just assert that _setup_pci_tracker is called
as expected.
Change-Id: I9fa1d509a3de405d6246fb8670612c65c10cc93b
Closes-Bug: #1839674
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839674
Title:
ResourceTracker.compute_nodes won't try to create a ComputeNode a
second time if the first create() fails
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) ocata series:
Triaged
Status in OpenStack Compute (nova) pike series:
Triaged
Status in OpenStack Compute (nova) queens series:
Triaged
Status in OpenStack Compute (nova) rocky series:
In Progress
Status in OpenStack Compute (nova) stein series:
In Progress
Bug description:
I found this while writing a functional recreate test for bug 1839560.
As of this change in Ocata:
https://github.com/openstack/nova/commit/1c967593fbb0ab8b9dc8b0b509e388591d32f537
The ResourceTracker.compute_nodes dict will store the ComputeNode
object *before* trying to create it:
https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L570-L571
The problem is if ComputeNode.create() fails for whatever reason, the
next run through update_available_resource won't try to create the
ComputeNode again because of this:
https://github.com/openstack/nova/blob/6b7d0caad86fe32ffc49a8672de1eb7258f3b919/nova/compute/resource_tracker.py#L546
And eventually you get errors like this:
b'2019-08-09 17:02:59,356 ERROR [nova.compute.manager] Error updating resources for node node2.'
b'Traceback (most recent call last):'
b' File "/home/osboxes/git/nova/nova/compute/manager.py", line 8250, in _update_available_resource_for_node'
b' startup=startup)'
b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 715, in update_available_resource'
b' self._update_available_resource(context, resources, startup=startup)'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 328, in inner'
b' return f(*args, **kwargs)'
b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 796, in _update_available_resource'
b' self._update(context, cn, startup=startup)'
b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1052, in _update'
b' self.old_resources[nodename] = old_compute'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__'
b' self.force_reraise()'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise'
b' six.reraise(self.type_, self.value, self.tb)'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/six.py", line 693, in reraise'
b' raise value'
b' File "/home/osboxes/git/nova/nova/compute/resource_tracker.py", line 1046, in _update'
b' compute_node.save()'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper'
b' return fn(self, *args, **kwargs)'
b' File "/home/osboxes/git/nova/nova/objects/compute_node.py", line 352, in save'
b' db_compute = db.compute_node_update(self._context, self.id, updates)'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 67, in getter'
b' self.obj_load_attr(name)'
b' File "/home/osboxes/git/nova/.tox/functional-py36/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr'
b' _("Cannot load \'%s\' in the base class") % attrname)'
b"NotImplementedError: Cannot load 'id' in the base class"
We should only map the ComputeNode when we've successfully created it.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839674/+subscriptions
References