← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1910466] Re: NUMA instance spawn fails on get_best_cpu_topology when there is no 'threads' preference

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/769614
Committed: https://opendev.org/openstack/nova/commit/387823b36d091abbaa37efb930fc98b94a5bbb93
Submitter: "Zuul (22348)"
Branch:    master

commit 387823b36d091abbaa37efb930fc98b94a5bbb93
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date:   Wed Jan 6 19:49:56 2021 +0000

    Fix max cpu topologies with numa affinity
    
    Nova has never supported specifying per numa node
    cpu toplogies. Logically the  cpu toplogy of a guest
    is independent of its numa toplogy and there is no
    way to model different cpu toplogies per numa node
    or implement that in hardware.
    
    The presence of the code in nova that allowed the generation
    of these invalid configuration has now been removed as it
    broke the automatic selection of cpu topologies based
    on hw:max_[cpus|sockets|threads] flavor and image properties.
    
    This change removed the incorrect code and related unit
    tests with assert nova could generate invalid topologies.
    
    Closes-Bug: #1910466
    Change-Id: Ia81a0fdbd950b51dbcc70c65ba492549a224ce2b


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1910466

Title:
  NUMA instance spawn fails on get_best_cpu_topology when there is no
  'threads' preference

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Seen downstream in a customer environment where a NUMA instance fails
  driver.spawn during get_best_cpu_topology when (1) there was no
  preference for cpu threads in the flavor and (2) the only possible
  topologies have > 1 cpu threads:

  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] Instance failed to spawn: IndexError: list index out of range
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] Traceback (most recent call last):
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2273, in _build_resources
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     yield resources
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2053, in _build_and_run_instance
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     block_device_info=block_device_info)
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3133, in spawn
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     mdevs=mdevs)
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5447, in _get_guest_xml
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     context, mdevs)
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5202, in _get_guest_config
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     instance.numa_topology)
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3923, in _get_guest_cpu_config
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     flavor, image_meta, numa_topology=instance_numa_topology)
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]   File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 625, in get_best_cpu_topology
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f]     allow_threads, numa_topology)[0]
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [instance: 890325b1-4d73-4a9c-86c0-c4e811690c3f] IndexError: list index out of range

  Note that the IndexError ^ is a separate unrelated bug that happens
  because an empty list [] is being returned after filtering for NUMA
  threads. This bug is about the empty list after NUMA threads
  filtering.

  In this example failure we have a request for vcpus=8 with limits
  cores=2, sockets=2, and threads=8:

  2020-12-27 23:45:12.322 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Getting desirable topologi
  es for flavor Flavor(created_at=2020-11-23T08:41:17Z,deleted=False,deleted_at=None,description=None,disabled=False,ephemeral_gb=0,extra_specs={hw:cpu_max_cores='2',hw:cpu_max_sockets='2',hw:cpu_max_thread
  s='8',hw:cpu_policy='dedicated',hw:mem_page_size='large',hw:numa_cpus.0='0,1,2,3',hw:numa_cpus.1='4,5,6,7',hw:numa_mem.0='16384',hw:numa_mem.1='16384',hw:numa_mempolicy='strict',hw:numa_nodes='2'},flavori
  d='2060ed99-654c-4309-88b8-9bceeb794ba3',id=176,is_public=True,memory_mb=32768,name='test',projects=<?>,root_gb=40,rxtx_factor=1.0,swap=0,updated_at=None,vcpu_weight=0,vcpus=8) and image_meta ImageMeta(ch
  ecksum='157e26aac48c1a02c08d07f4f7a6d1b6',container_format='bare',created_at=2020-03-08T16:12:44Z,direct_url=<?>,disk_format='qcow2',id=7812d228-07c8-4a8f-9878-459a0093cc34,min_disk=0,min_ram=0,name='UAG_
  generic-180104',owner='7a2328acc0a6451c8b23fb8184932506',properties=ImageMetaProps,protected=<?>,size=769468416,status='active',tags=<?>,updated_at=2020-03-23T06:11:54Z,virtual_size=<?>,visibility=<?>), a
  llow threads: True _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:567
  2020-12-27 23:45:12.323 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Flavor limits 2:2:8 _get_c
  pu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:313
  2020-12-27 23:45:12.323 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Image limits 2:2:8 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:324
  2020-12-27 23:45:12.324 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Flavor pref -1:-1:-1 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:347
  2020-12-27 23:45:12.324 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Image pref -1:-1:-1 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:366
  2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Chosen -1:-1:-1 limits 2:2:8 _get_cpu_topology_constraints /usr/lib/python2.7/site-packages/nova/virt/hardware.py:395
  2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Topology preferred VirtCPUTopology(cores=-1,sockets=-1,threads=-1), maximum VirtCPUTopology(cores=2,sockets=2,threads=8) _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:571
  2020-12-27 23:45:12.325 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Build topologies for 8 vcpu(s) 2:2:8 _get_possible_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:434
  2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Got 4 possible topologies _get_possible_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:461
  2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Possible topologies [VirtCPUTopology(cores=2,sockets=2,threads=2), VirtCPUTopology(cores=1,sockets=2,threads=4), VirtCPUTopology(cores=2,sockets=1,threads=4), VirtCPUTopology(cores=1,sockets=1,threads=8)] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:576
  2020-12-27 23:45:12.326 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Filtering topologies best for 1 threads _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:594
  2020-12-27 23:45:12.327 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Remaining possible topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:599
  2020-12-27 23:45:12.327 1 DEBUG nova.virt.hardware [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] Sorted desired topologies [] _get_desirable_cpu_topologies /usr/lib/python2.7/site-packages/nova/virt/hardware.py:602
  2020-12-27 23:45:12.327 1 ERROR nova.compute.manager [req-321f4757-b777-4b6a-93d9-26352fe343a3 8529101f4f0c482ea4b82f9a955785cf 7a2328acc0a6451c8b23fb8184932506 - default default] [instance: 890325b1-4d73
  -4a9c-86c0-c4e811690c3f] Instance failed to spawn: IndexError: list index out of range

  This is showing that the flavor and image specified no preference for
  sockets, cores, and threads (they are all -1) and 8 vcpus are
  required. The possible topologies that would satisfy the request for 8
  vcpus and stay within the limits of 2 max cores, 2 max sockets, and 8
  max threads are: [VirtCPUTopology(cores=2,sockets=2,threads=2),
  VirtCPUTopology(cores=1,sockets=2,threads=4),
  VirtCPUTopology(cores=2,sockets=1,threads=4),
  VirtCPUTopology(cores=1,sockets=1,threads=8)] . When there is no
  preference for the number of threads, the code will use a value of 1
  for the desired number of threads. It will then filter for the closest
  number of threads that does not exceed the desired number of threads.
  Because only 4 or 8 threads could satisfy the request for 8 vcpus and
  4 and 8 are greater than 1, all of the possible topologies were
  filtered out, leaving an empty list and the request could not be
  fulfilled.

  Because the request expressed no preference for number of threads, one
  of the 4 possible cpu topologies should have been chosen instead of
  filtering all of the topologies out and returning an empty list. We
  will need to fix the logic around how requests without threads
  preference are handled.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1910466/+subscriptions


References