← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1776244] [NEW] KeyError during instance boot if vcpu_pin_set contains not all of the core siblings

 

Public bug reported:

I reproduced this on mitaka, but seems like master has the same issue

The following flavor was used:

$ openstack flavor show medium-dedicated
+----------------------------+--------------------------------------+
| Field                      | Value                                |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                |
| OS-FLV-EXT-DATA:ephemeral  | 0                                    |
| disk                       | 5                                    |
| id                         | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
| name                       | medium-dedicated                     |
| os-flavor-access:is_public | True                                 |
| properties                 | hw:cpu_policy='dedicated'            |
| ram                        | 512                                  |
| rxtx_factor                | 1.0                                  |
| swap                       |                                      |
| vcpus                      | 4                                    |
+----------------------------+--------------------------------------+

Instance image does not have any custom properties.

The following traceback can be seen in the nova-compute during boot of
an instance with this flavor:

2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     yield resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     block_device_info=block_device_info)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     write_to_disk=True)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     context)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     pcpu = object_numa_cell.cpu_pinning[cpu]
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]

Here is the topology configuration (virsh capabilities) of the host that
causes trouble (done this to reproduce the issue):

    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>10239384</memory>
          <pages unit='KiB' size='4'>2559846</pages>
          <pages unit='KiB' size='2048'>0</pages>
          <pages unit='KiB' size='1048576'>0</pages>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='20'/>
          </distances>
          <cpus num='6'>
            <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
            <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
            <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
            <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
            <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>10321056</memory>
          <pages unit='KiB' size='4'>2580264</pages>
          <pages unit='KiB' size='2048'>0</pages>
          <pages unit='KiB' size='1048576'>0</pages>
          <distances>
            <sibling id='0' value='20'/>
            <sibling id='1' value='10'/>
          </distances>
          <cpus num='2'>
            <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
            <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
          </cpus>
        </cell>
      </cells>
    </topology>

vcpu_pin_set = 1,3,4,5,6,7 in nova.conf

In the nova database, host topology looks the following way (including
only relevant fields):

cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]

It is caused by the fact that during fitting the instance to host cell
we consider avail_cpus, but not free_siblings, so when asking for 4
vcpus, we get to cell0, as there are 4 available. But the compute adds
vcpu-pcpu mapping only for two available siblings, and when accessing
the third one key error happens.

Also we might need to add more info to the docs about the siblings, and
what to include in vcpu_pin_set, so that people don't misconfigure
things.

** Affects: nova
     Importance: Undecided
     Assignee: Vladyslav Drok (vdrok)
         Status: New

** Changed in: nova
     Assignee: (unassigned) => Vladyslav Drok (vdrok)

** Description changed:

  I reproduced this on mitaka, but seems like master has the same issue
  
  The following flavor was used:
  
  $ openstack flavor show medium-dedicated
  +----------------------------+--------------------------------------+
  | Field                      | Value                                |
  +----------------------------+--------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
  | disk                       | 5                                    |
  | id                         | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
  | name                       | medium-dedicated                     |
  | os-flavor-access:is_public | True                                 |
  | properties                 | hw:cpu_policy='dedicated'            |
  | ram                        | 512                                  |
  | rxtx_factor                | 1.0                                  |
  | swap                       |                                      |
  | vcpus                      | 4                                    |
  +----------------------------+--------------------------------------+
  
  Instance image does not have any custom properties.
  
  The following traceback can be seen in the nova-compute during boot of
  an instance with this flavor:
  
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     yield resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     block_device_info=block_device_info)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     write_to_disk=True)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     context)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     pcpu = object_numa_cell.cpu_pinning[cpu]
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]
  
  Here is the topology configuration (virsh capabilities) of the host that
  causes trouble (done this to reproduce the issue):
  
-     <topology>
-       <cells num='2'>
-         <cell id='0'>
-           <memory unit='KiB'>10239384</memory>
-           <pages unit='KiB' size='4'>2559846</pages>
-           <pages unit='KiB' size='2048'>0</pages>
-           <pages unit='KiB' size='1048576'>0</pages>
-           <distances>
-             <sibling id='0' value='10'/>
-             <sibling id='1' value='20'/>
-           </distances>
-           <cpus num='6'>
-             <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
-             <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
-             <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
-             <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
-             <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
-             <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
-           </cpus>
-         </cell>
-         <cell id='1'>
-           <memory unit='KiB'>10321056</memory>
-           <pages unit='KiB' size='4'>2580264</pages>
-           <pages unit='KiB' size='2048'>0</pages>
-           <pages unit='KiB' size='1048576'>0</pages>
-           <distances>
-             <sibling id='0' value='20'/>
-             <sibling id='1' value='10'/>
-           </distances>
-           <cpus num='2'>
-             <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
-             <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
-           </cpus>
-         </cell>
-       </cells>
-     </topology>
+     <topology>
+       <cells num='2'>
+         <cell id='0'>
+           <memory unit='KiB'>10239384</memory>
+           <pages unit='KiB' size='4'>2559846</pages>
+           <pages unit='KiB' size='2048'>0</pages>
+           <pages unit='KiB' size='1048576'>0</pages>
+           <distances>
+             <sibling id='0' value='10'/>
+             <sibling id='1' value='20'/>
+           </distances>
+           <cpus num='6'>
+             <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
+             <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
+             <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
+             <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
+             <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
+             <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
+           </cpus>
+         </cell>
+         <cell id='1'>
+           <memory unit='KiB'>10321056</memory>
+           <pages unit='KiB' size='4'>2580264</pages>
+           <pages unit='KiB' size='2048'>0</pages>
+           <pages unit='KiB' size='1048576'>0</pages>
+           <distances>
+             <sibling id='0' value='20'/>
+             <sibling id='1' value='10'/>
+           </distances>
+           <cpus num='2'>
+             <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
+             <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
+           </cpus>
+         </cell>
+       </cells>
+     </topology>
  
  vcpu_pin_set = 1,3,4,5,6,7 in nova.conf
  
  In the nova database, host topology looks the following way (including
  only relevant fields):
  
  cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
  cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]
  
  It is caused by the fact that during fitting the instance to host cell
  we consider avail_cpus, but not free_siblings, so when asking for 4
- vcpus, the compute adds vcpu-pcpu mapping only for two available
- siblings, and when accessing the third one key error happens.
+ vcpus, we get to cell0, as there are 4 available. But the compute adds
+ vcpu-pcpu mapping only for two available siblings, and when accessing
+ the third one key error happens.

** Description changed:

  I reproduced this on mitaka, but seems like master has the same issue
  
  The following flavor was used:
  
  $ openstack flavor show medium-dedicated
  +----------------------------+--------------------------------------+
  | Field                      | Value                                |
  +----------------------------+--------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
  | disk                       | 5                                    |
  | id                         | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
  | name                       | medium-dedicated                     |
  | os-flavor-access:is_public | True                                 |
  | properties                 | hw:cpu_policy='dedicated'            |
  | ram                        | 512                                  |
  | rxtx_factor                | 1.0                                  |
  | swap                       |                                      |
  | vcpus                      | 4                                    |
  +----------------------------+--------------------------------------+
  
  Instance image does not have any custom properties.
  
  The following traceback can be seen in the nova-compute during boot of
  an instance with this flavor:
  
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     yield resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     block_device_info=block_device_info)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     write_to_disk=True)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     context)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     pcpu = object_numa_cell.cpu_pinning[cpu]
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]
  
  Here is the topology configuration (virsh capabilities) of the host that
  causes trouble (done this to reproduce the issue):
  
      <topology>
        <cells num='2'>
          <cell id='0'>
            <memory unit='KiB'>10239384</memory>
            <pages unit='KiB' size='4'>2559846</pages>
            <pages unit='KiB' size='2048'>0</pages>
            <pages unit='KiB' size='1048576'>0</pages>
            <distances>
              <sibling id='0' value='10'/>
              <sibling id='1' value='20'/>
            </distances>
            <cpus num='6'>
              <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
              <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
              <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
              <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
              <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
              <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
            </cpus>
          </cell>
          <cell id='1'>
            <memory unit='KiB'>10321056</memory>
            <pages unit='KiB' size='4'>2580264</pages>
            <pages unit='KiB' size='2048'>0</pages>
            <pages unit='KiB' size='1048576'>0</pages>
            <distances>
              <sibling id='0' value='20'/>
              <sibling id='1' value='10'/>
            </distances>
            <cpus num='2'>
              <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
              <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
            </cpus>
          </cell>
        </cells>
      </topology>
  
  vcpu_pin_set = 1,3,4,5,6,7 in nova.conf
  
  In the nova database, host topology looks the following way (including
  only relevant fields):
  
  cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
  cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]
  
  It is caused by the fact that during fitting the instance to host cell
  we consider avail_cpus, but not free_siblings, so when asking for 4
  vcpus, we get to cell0, as there are 4 available. But the compute adds
  vcpu-pcpu mapping only for two available siblings, and when accessing
  the third one key error happens.
+ 
+ Also we might need to add more info to the docs about the siblings, and
+ what to include in vcpu_pin_set, so that people don't misconfigure
+ things.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1776244

Title:
  KeyError during instance boot if vcpu_pin_set contains not all of the
  core siblings

Status in OpenStack Compute (nova):
  New

Bug description:
  I reproduced this on mitaka, but seems like master has the same issue

  The following flavor was used:

  $ openstack flavor show medium-dedicated
  +----------------------------+--------------------------------------+
  | Field                      | Value                                |
  +----------------------------+--------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
  | disk                       | 5                                    |
  | id                         | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
  | name                       | medium-dedicated                     |
  | os-flavor-access:is_public | True                                 |
  | properties                 | hw:cpu_policy='dedicated'            |
  | ram                        | 512                                  |
  | rxtx_factor                | 1.0                                  |
  | swap                       |                                      |
  | vcpus                      | 4                                    |
  +----------------------------+--------------------------------------+

  Instance image does not have any custom properties.

  The following traceback can be seen in the nova-compute during boot of
  an instance with this flavor:

  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     yield resources
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     block_device_info=block_device_info)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     write_to_disk=True)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     context)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]     pcpu = object_numa_cell.cpu_pinning[cpu]
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
  2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]

  Here is the topology configuration (virsh capabilities) of the host
  that causes trouble (done this to reproduce the issue):

      <topology>
        <cells num='2'>
          <cell id='0'>
            <memory unit='KiB'>10239384</memory>
            <pages unit='KiB' size='4'>2559846</pages>
            <pages unit='KiB' size='2048'>0</pages>
            <pages unit='KiB' size='1048576'>0</pages>
            <distances>
              <sibling id='0' value='10'/>
              <sibling id='1' value='20'/>
            </distances>
            <cpus num='6'>
              <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
              <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
              <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
              <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
              <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
              <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
            </cpus>
          </cell>
          <cell id='1'>
            <memory unit='KiB'>10321056</memory>
            <pages unit='KiB' size='4'>2580264</pages>
            <pages unit='KiB' size='2048'>0</pages>
            <pages unit='KiB' size='1048576'>0</pages>
            <distances>
              <sibling id='0' value='20'/>
              <sibling id='1' value='10'/>
            </distances>
            <cpus num='2'>
              <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
              <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
            </cpus>
          </cell>
        </cells>
      </topology>

  vcpu_pin_set = 1,3,4,5,6,7 in nova.conf

  In the nova database, host topology looks the following way (including
  only relevant fields):

  cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
  cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]

  It is caused by the fact that during fitting the instance to host cell
  we consider avail_cpus, but not free_siblings, so when asking for 4
  vcpus, we get to cell0, as there are 4 available. But the compute adds
  vcpu-pcpu mapping only for two available siblings, and when accessing
  the third one key error happens.

  Also we might need to add more info to the docs about the siblings,
  and what to include in vcpu_pin_set, so that people don't misconfigure
  things.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1776244/+subscriptions


Follow ups