yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74605
[Bug 1791075] [NEW] update_available_resource periodic does not take into account all evacuation states
Public bug reported:
Current _update_usage_from_migrations code takes into account only
REBUILDING task state, while not handling properly rebuilding spawn and
rebuilding volume attachments. This can cause issues with numa
topologies or pci devices if several instances are being evacuated and
some of them begin evacuation prior to update_available_resource
periodic pass and others immediately after, causing latter ones to claim
e.g. already pinned cpus.
Here is an example traceback that appears in nova-compute log after the
instance was evacuated:
2018-06-27T16:16:59.181573+02:00 compute-0-8.domain.tld nova-compute[19571]: 2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager [req-79bc5f9f-9d5e-4f55-ad56-8351930afcb3 - - - - -] Error updating resources for node compute-0-8.domain.tld.
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager Traceback (most recent call last):
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6533, in update_available_resource_for_node
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager rt.update_available_resource(context, periodic=True)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 594, in update_available_resource
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_available_resource(context, resources, periodic=periodic)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager return f(*args, **kwargs)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 661, in _update_available_resource
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_usage_from_instances(context, instances)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1035, in _update_usage_from_instances
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_usage_from_instance(context, instance)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1001, in _update_usage_from_instance
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_usage(instance, sign=sign)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 834, in _update_usage
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self.compute_node, usage, free)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/hardware.py", line 1491, in get_host_numa_usage_from_instance
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager host_numa_topology, instance_numa_topology, free=free))
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/hardware.py", line 1356, in numa_usage_from_instances
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager newcell.pin_cpus(pinned_cpus)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/objects/numa.py", line 85, in pin_cpus
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager pinned=list(self.pinned_cpus))
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager CPUPinningInvalid: Cannot pin/unpin cpus [10, 34] from the following pinned set [9, 10, 34, 33]
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager
** Affects: nova
Importance: Undecided
Assignee: Vladyslav Drok (vdrok)
Status: In Progress
** Changed in: nova
Assignee: (unassigned) => Vladyslav Drok (vdrok)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1791075
Title:
update_available_resource periodic does not take into account all
evacuation states
Status in OpenStack Compute (nova):
In Progress
Bug description:
Current _update_usage_from_migrations code takes into account only
REBUILDING task state, while not handling properly rebuilding spawn
and rebuilding volume attachments. This can cause issues with numa
topologies or pci devices if several instances are being evacuated and
some of them begin evacuation prior to update_available_resource
periodic pass and others immediately after, causing latter ones to
claim e.g. already pinned cpus.
Here is an example traceback that appears in nova-compute log after
the instance was evacuated:
2018-06-27T16:16:59.181573+02:00 compute-0-8.domain.tld nova-compute[19571]: 2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager [req-79bc5f9f-9d5e-4f55-ad56-8351930afcb3 - - - - -] Error updating resources for node compute-0-8.domain.tld.
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager Traceback (most recent call last):
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 6533, in update_available_resource_for_node
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager rt.update_available_resource(context, periodic=True)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 594, in update_available_resource
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_available_resource(context, resources, periodic=periodic)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager return f(*args, **kwargs)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 661, in _update_available_resource
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_usage_from_instances(context, instances)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1035, in _update_usage_from_instances
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_usage_from_instance(context, instance)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 1001, in _update_usage_from_instance
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self._update_usage(instance, sign=sign)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 834, in _update_usage
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager self.compute_node, usage, free)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/hardware.py", line 1491, in get_host_numa_usage_from_instance
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager host_numa_topology, instance_numa_topology, free=free))
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/hardware.py", line 1356, in numa_usage_from_instances
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager newcell.pin_cpus(pinned_cpus)
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/objects/numa.py", line 85, in pin_cpus
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager pinned=list(self.pinned_cpus))
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager CPUPinningInvalid: Cannot pin/unpin cpus [10, 34] from the following pinned set [9, 10, 34, 33]
2018-06-27 16:16:59.163 19571 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1791075/+subscriptions
Follow ups