yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86430
[Bug 1910466] Re: NUMA instance spawn fails on get_best_cpu_topology when there is no 'threads' preference
Reviewed: https://review.opendev.org/c/openstack/nova/+/769614
Committed: https://opendev.org/openstack/nova/commit/387823b36d091abbaa37efb930fc98b94a5bbb93
Submitter: "Zuul (22348)"
Branch: master
commit 387823b36d091abbaa37efb930fc98b94a5bbb93
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date: Wed Jan 6 19:49:56 2021 +0000
Fix max cpu topologies with numa affinity
Nova has never supported specifying per numa node
cpu toplogies. Logically the cpu toplogy of a guest
is independent of its numa toplogy and there is no
way to model different cpu toplogies per numa node
or implement that in hardware.
The presence of the code in nova that allowed the generation
of these invalid configuration has now been removed as it
broke the automatic selection of cpu topologies based
on hw:max_[cpus|sockets|threads] flavor and image properties.
This change removed the incorrect code and related unit
tests with assert nova could generate invalid topologies.
Closes-Bug: #1910466
Change-Id: Ia81a0fdbd950b51dbcc70c65ba492549a224ce2b
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1910466
Title:
NUMA instance spawn fails on get_best_cpu_topology when there is no
'threads' preference
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Seen downstream in a customer environment where a NUMA instance fails
driver.spawn during get_best_cpu_topology when (1) there was no
preference for cpu threads in the flavor and (2) the only possible
topologies have > 1 cpu threads:
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] Instance failed to spawn: IndexError: list index out of range
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] Traceback (most recent call last):
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2273, in _build_resources
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] yield resources
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2053, in _build_and_run_instance
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] block_device_info=block_device_info)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3133, in spawn
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] mdevs=mdevs)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5447, in _get_guest_xml
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] context, mdevs)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5202, in _get_guest_config
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] instance.numa_topology)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3923, in _get_guest_cpu_config
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] flavor, image_meta, numa_topology=instance_numa_topology)
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 625, in get_best_cpu_topology
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] allow_threads, numa_topology)[0]
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] IndexError: list index out of range
Note that the IndexError ^ is a separate unrelated bug that happens
because an empty list [] is being returned after filtering for NUMA
threads. This bug is about the empty list after NUMA threads
filtering.
In this example failure we have a request for vcpus=8 with limits
cores=2, sockets=2, and threads=8:
2020-12-27 23:45:12.322 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Getting desirable topologi
es for flavor Flavor(created_at=2020-11-23T08:41:17Z,deleted=False,deleted_at=None,description=None,disabled=False,ephemeral_gb=0,extra_specs={hw:cpu_max_cores='2',hw:cpu_max_sockets='2',hw:cpu_max_thread
s='8',hw:cpu_policy='dedicated',hw:mem_page_size='large',hw:numa_cpus.0='0,1,2,3',hw:numa_cpus.1='4,5,6,7',hw:numa_mem.0='16384',hw:numa_mem.1='16384',hw:numa_mempolicy='strict',hw:numa_nodes='2'},flavori
d='2060ed99-654c-4309-88b8-9bceeb794ba3',id=176,is_public=True,memory_mb=32768,name='test',projects=<?>,root_gb=40,rxtx_factor=1.0,swap=0,updated_at=None,vcpu_weight=0,vcpus=8) and image_meta ImageMeta(ch
ecksum='157e26aac48c1a02c08d07f4f7a6d1b6',container_format='bare',created_at=2020-03-08T16:12:44Z,direct_url=<?>,disk_format='qcow2',id=7812d228-07c8-4a8f-9878-459a0093cc34,min_disk=0,min_ram=0,name='UAG_
generic-180104',owner='7a2328acc0a6451c8b23fb8184932506',properties=ImageMetaProps,protected=<?>,size=769468416,status='active',tags=<?>,updated_at=2020-03-23T06:11:54Z,virtual_size=<?>,visibility=<?>), a
llow threads: True _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:567
2020-12-27 23:45:12.323 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Flavor limits 2:2:8 _get_c
pu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:313
2020-12-27 23:45:12.323 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Image limits 2:2:8 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:324
2020-12-27 23:45:12.324 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Flavor pref -1:-1:-1 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:347
2020-12-27 23:45:12.324 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Image pref -1:-1:-1 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:366
2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Chosen -1:-1:-1 limits 2:2:8 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:395
2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Topology preferred VirtCPUTopology(cores=-1,sockets=-1,threads=-1), maximum VirtCPUTopology(cores=2,sockets=2,threads=8) _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:571
2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Build topologies for 8 vcpu(s) 2:2:8 _get_possible_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:434
2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Got 4 possible topologies _get_possible_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:461
2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Possible topologies [VirtCPUTopology(cores=2,sockets=2,threads=2), VirtCPUTopology(cores=1,sockets=2,threads=4), VirtCPUTopology(cores=2,sockets=1,threads=4), VirtCPUTopology(cores=1,sockets=1,threads=8)] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:576
2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Filtering topologies best for 1 threads _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:594
2020-12-27 23:45:12.327 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Remaining possible topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:599
2020-12-27 23:45:12.327 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Sorted desired topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:602
2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] [instance: 890325b1-4d73
-4a9c-86c0-c4e811690c3f] Instance failed to spawn: IndexError: list index out of range
This is showing that the flavor and image specified no preference for
sockets, cores, and threads (they are all -1) and 8 vcpus are
required. The possible topologies that would satisfy the request for 8
vcpus and stay within the limits of 2 max cores, 2 max sockets, and 8
max threads are: [VirtCPUTopology(cores=2,sockets=2,threads=2),
VirtCPUTopology(cores=1,sockets=2,threads=4),
VirtCPUTopology(cores=2,sockets=1,threads=4),
VirtCPUTopology(cores=1,sockets=1,threads=8)] . When there is no
preference for the number of threads, the code will use a value of 1
for the desired number of threads. It will then filter for the closest
number of threads that does not exceed the desired number of threads.
Because only 4 or 8 threads could satisfy the request for 8 vcpus and
4 and 8 are greater than 1, all of the possible topologies were
filtered out, leaving an empty list and the request could not be
fulfilled.
Because the request expressed no preference for number of threads, one
of the 4 possible cpu topologies should have been chosen instead of
filtering all of the topologies out and returning an empty list. We
will need to fix the logic around how requests without threads
preference are handled.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1910466/+subscriptions
References