← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1776244] Re: KeyError during instance boot if vcpu_pin_set contains not all of the core siblings

 

*** This bug is a duplicate of bug 1744965 ***
    https://bugs.launchpad.net/bugs/1744965

It seems like this series of patches fixed the issue on master
https://review.openstack.org/537364

** This bug has been marked a duplicate of bug 1744965
   'emulator_threads_policy' doesn't work with 'vcpu_pin_set'

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1776244

Title:
  KeyError during instance boot if vcpu_pin_set contains not all of the
  core siblings

Status in OpenStack Compute (nova):
  New

Bug description:
  I reproduced this on mitaka, but seems like master has the same issue

  The following flavor was used:

  $ openstack flavor show medium-dedicated
  +----------------------------+--------------------------------------+
  | Field                      | Value                                |
  +----------------------------+--------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
  | disk                       | 5                                    |
  | id                         | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
  | name                       | medium-dedicated                     |
  | os-flavor-access:is_public | True                                 |
  | properties                 | hw:cpu_policy='dedicated'            |
  | ram                        | 512                                  |
  | rxtx_factor                | 1.0                                  |
  | swap                       |                                      |
  | vcpus                      | 4                                    |
  +----------------------------+--------------------------------------+

  Instance image does not have any custom properties.

  The following traceback can be seen in the nova-compute during boot of
  an instance with this flavor:

  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     yield resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     block_device_info=block_device_info)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     write_to_disk=True)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     context)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     pcpu = object_numa_cell.cpu_pinning[cpu]
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]

  Here is the topology configuration (virsh capabilities) of the host
  that causes trouble (done this to reproduce the issue):

      <topology>
        <cells num='2'>
          <cell id='0'>
            <memory unit='KiB'>10239384</memory>
            <pages unit='KiB' size='4'>2559846</pages>
            <pages unit='KiB' size='2048'>0</pages>
            <pages unit='KiB' size='1048576'>0</pages>
            <distances>
              <sibling id='0' value='10'/>
              <sibling id='1' value='20'/>
            </distances>
            <cpus num='6'>
              <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
              <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
              <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
              <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
              <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
              <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
            </cpus>
          </cell>
          <cell id='1'>
            <memory unit='KiB'>10321056</memory>
            <pages unit='KiB' size='4'>2580264</pages>
            <pages unit='KiB' size='2048'>0</pages>
            <pages unit='KiB' size='1048576'>0</pages>
            <distances>
              <sibling id='0' value='20'/>
              <sibling id='1' value='10'/>
            </distances>
            <cpus num='2'>
              <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
              <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
            </cpus>
          </cell>
        </cells>
      </topology>

  vcpu_pin_set = 1,3,4,5,6,7 in nova.conf

  In the nova database, host topology looks the following way (including
  only relevant fields):

  cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
  cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]

  It is caused by the fact that during fitting the instance to host cell
  we consider avail_cpus, but not free_siblings, so when asking for 4
  vcpus, we get to cell0, as there are 4 available. But the compute adds
  vcpu-pcpu mapping only for two available siblings, and when accessing
  the third one key error happens.

  Also we might need to add more info to the docs about the siblings,
  and what to include in vcpu_pin_set, so that people don't misconfigure
  things.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1776244/+subscriptions


References