yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #73244
[Bug 1776244] [NEW] KeyError during instance boot if vcpu_pin_set contains not all of the core siblings
Public bug reported:
I reproduced this on mitaka, but seems like master has the same issue
The following flavor was used:
$ openstack flavor show medium-dedicated
+----------------------------+--------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 5 |
| id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
| name | medium-dedicated |
| os-flavor-access:is_public | True |
| properties | hw:cpu_policy='dedicated' |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+----------------------------+--------------------------------------+
Instance image does not have any custom properties.
The following traceback can be seen in the nova-compute during boot of
an instance with this flavor:
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu]
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]
Here is the topology configuration (virsh capabilities) of the host that
causes trouble (done this to reproduce the issue):
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'>10239384</memory>
<pages unit='KiB' size='4'>2559846</pages>
<pages unit='KiB' size='2048'>0</pages>
<pages unit='KiB' size='1048576'>0</pages>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='20'/>
</distances>
<cpus num='6'>
<cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
<cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
<cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
<cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'>10321056</memory>
<pages unit='KiB' size='4'>2580264</pages>
<pages unit='KiB' size='2048'>0</pages>
<pages unit='KiB' size='1048576'>0</pages>
<distances>
<sibling id='0' value='20'/>
<sibling id='1' value='10'/>
</distances>
<cpus num='2'>
<cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
<cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
</cpus>
</cell>
</cells>
</topology>
vcpu_pin_set = 1,3,4,5,6,7 in nova.conf
In the nova database, host topology looks the following way (including
only relevant fields):
cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]
It is caused by the fact that during fitting the instance to host cell
we consider avail_cpus, but not free_siblings, so when asking for 4
vcpus, we get to cell0, as there are 4 available. But the compute adds
vcpu-pcpu mapping only for two available siblings, and when accessing
the third one key error happens.
Also we might need to add more info to the docs about the siblings, and
what to include in vcpu_pin_set, so that people don't misconfigure
things.
** Affects: nova
Importance: Undecided
Assignee: Vladyslav Drok (vdrok)
Status: New
** Changed in: nova
Assignee: (unassigned) => Vladyslav Drok (vdrok)
** Description changed:
I reproduced this on mitaka, but seems like master has the same issue
The following flavor was used:
$ openstack flavor show medium-dedicated
+----------------------------+--------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 5 |
| id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
| name | medium-dedicated |
| os-flavor-access:is_public | True |
| properties | hw:cpu_policy='dedicated' |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+----------------------------+--------------------------------------+
Instance image does not have any custom properties.
The following traceback can be seen in the nova-compute during boot of
an instance with this flavor:
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu]
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]
Here is the topology configuration (virsh capabilities) of the host that
causes trouble (done this to reproduce the issue):
- <topology>
- <cells num='2'>
- <cell id='0'>
- <memory unit='KiB'>10239384</memory>
- <pages unit='KiB' size='4'>2559846</pages>
- <pages unit='KiB' size='2048'>0</pages>
- <pages unit='KiB' size='1048576'>0</pages>
- <distances>
- <sibling id='0' value='10'/>
- <sibling id='1' value='20'/>
- </distances>
- <cpus num='6'>
- <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
- <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
- <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
- <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
- <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
- <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
- </cpus>
- </cell>
- <cell id='1'>
- <memory unit='KiB'>10321056</memory>
- <pages unit='KiB' size='4'>2580264</pages>
- <pages unit='KiB' size='2048'>0</pages>
- <pages unit='KiB' size='1048576'>0</pages>
- <distances>
- <sibling id='0' value='20'/>
- <sibling id='1' value='10'/>
- </distances>
- <cpus num='2'>
- <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
- <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
- </cpus>
- </cell>
- </cells>
- </topology>
+ <topology>
+ <cells num='2'>
+ <cell id='0'>
+ <memory unit='KiB'>10239384</memory>
+ <pages unit='KiB' size='4'>2559846</pages>
+ <pages unit='KiB' size='2048'>0</pages>
+ <pages unit='KiB' size='1048576'>0</pages>
+ <distances>
+ <sibling id='0' value='10'/>
+ <sibling id='1' value='20'/>
+ </distances>
+ <cpus num='6'>
+ <cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
+ <cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
+ <cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
+ <cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
+ <cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
+ <cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
+ </cpus>
+ </cell>
+ <cell id='1'>
+ <memory unit='KiB'>10321056</memory>
+ <pages unit='KiB' size='4'>2580264</pages>
+ <pages unit='KiB' size='2048'>0</pages>
+ <pages unit='KiB' size='1048576'>0</pages>
+ <distances>
+ <sibling id='0' value='20'/>
+ <sibling id='1' value='10'/>
+ </distances>
+ <cpus num='2'>
+ <cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
+ <cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
+ </cpus>
+ </cell>
+ </cells>
+ </topology>
vcpu_pin_set = 1,3,4,5,6,7 in nova.conf
In the nova database, host topology looks the following way (including
only relevant fields):
cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]
It is caused by the fact that during fitting the instance to host cell
we consider avail_cpus, but not free_siblings, so when asking for 4
- vcpus, the compute adds vcpu-pcpu mapping only for two available
- siblings, and when accessing the third one key error happens.
+ vcpus, we get to cell0, as there are 4 available. But the compute adds
+ vcpu-pcpu mapping only for two available siblings, and when accessing
+ the third one key error happens.
** Description changed:
I reproduced this on mitaka, but seems like master has the same issue
The following flavor was used:
$ openstack flavor show medium-dedicated
+----------------------------+--------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 5 |
| id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
| name | medium-dedicated |
| os-flavor-access:is_public | True |
| properties | hw:cpu_policy='dedicated' |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+----------------------------+--------------------------------------+
Instance image does not have any custom properties.
The following traceback can be seen in the nova-compute during boot of
an instance with this flavor:
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu]
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]
Here is the topology configuration (virsh capabilities) of the host that
causes trouble (done this to reproduce the issue):
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'>10239384</memory>
<pages unit='KiB' size='4'>2559846</pages>
<pages unit='KiB' size='2048'>0</pages>
<pages unit='KiB' size='1048576'>0</pages>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='20'/>
</distances>
<cpus num='6'>
<cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
<cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
<cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
<cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'>10321056</memory>
<pages unit='KiB' size='4'>2580264</pages>
<pages unit='KiB' size='2048'>0</pages>
<pages unit='KiB' size='1048576'>0</pages>
<distances>
<sibling id='0' value='20'/>
<sibling id='1' value='10'/>
</distances>
<cpus num='2'>
<cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
<cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
</cpus>
</cell>
</cells>
</topology>
vcpu_pin_set = 1,3,4,5,6,7 in nova.conf
In the nova database, host topology looks the following way (including
only relevant fields):
cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]
It is caused by the fact that during fitting the instance to host cell
we consider avail_cpus, but not free_siblings, so when asking for 4
vcpus, we get to cell0, as there are 4 available. But the compute adds
vcpu-pcpu mapping only for two available siblings, and when accessing
the third one key error happens.
+
+ Also we might need to add more info to the docs about the siblings, and
+ what to include in vcpu_pin_set, so that people don't misconfigure
+ things.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1776244
Title:
KeyError during instance boot if vcpu_pin_set contains not all of the
core siblings
Status in OpenStack Compute (nova):
New
Bug description:
I reproduced this on mitaka, but seems like master has the same issue
The following flavor was used:
$ openstack flavor show medium-dedicated
+----------------------------+--------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 5 |
| id | 745d4bbb-78b8-4b86-83bf-f009745cd9b8 |
| name | medium-dedicated |
| os-flavor-access:is_public | True |
| properties | hw:cpu_policy='dedicated' |
| ram | 512 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+----------------------------+--------------------------------------+
Instance image does not have any custom properties.
The following traceback can be seen in the nova-compute during boot of
an instance with this flavor:
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [req-786c093f-c0cf-4146-b55e-6ba2527af8de b7d47d36ea5144df9635ec1c834efde7 336db1eb014b4a2399c70cfe29360493 - - -] [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Instance failed to spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] Traceback (most recent call last):
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2221, in _build_resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] yield resources
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2067, in _build_and_run_instance
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] block_device_info=block_device_info)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2811, in spawn
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] write_to_disk=True)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4829, in _get_guest_xml
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] context)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4635, in _get_guest_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] instance.numa_topology, flavor, pci_devs, allowed_cpus, image_meta)
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4121, in _get_guest_numa_config
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] pcpu = object_numa_cell.cpu_pinning[cpu]
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f] KeyError: 2
2018-06-11 14:42:41.177 11367 ERROR nova.compute.manager [instance: 6a03bfcd-3fc1-40be-bb68-b235c23dc55f]
Here is the topology configuration (virsh capabilities) of the host
that causes trouble (done this to reproduce the issue):
<topology>
<cells num='2'>
<cell id='0'>
<memory unit='KiB'>10239384</memory>
<pages unit='KiB' size='4'>2559846</pages>
<pages unit='KiB' size='2048'>0</pages>
<pages unit='KiB' size='1048576'>0</pages>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='20'/>
</distances>
<cpus num='6'>
<cpu id='0' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='1' socket_id='0' core_id='0' siblings='0-1'/>
<cpu id='2' socket_id='1' core_id='0' siblings='2-3'/>
<cpu id='3' socket_id='1' core_id='0' siblings='2-3'/>
<cpu id='4' socket_id='2' core_id='0' siblings='4-5'/>
<cpu id='5' socket_id='2' core_id='0' siblings='4-5'/>
</cpus>
</cell>
<cell id='1'>
<memory unit='KiB'>10321056</memory>
<pages unit='KiB' size='4'>2580264</pages>
<pages unit='KiB' size='2048'>0</pages>
<pages unit='KiB' size='1048576'>0</pages>
<distances>
<sibling id='0' value='20'/>
<sibling id='1' value='10'/>
</distances>
<cpus num='2'>
<cpu id='6' socket_id='3' core_id='0' siblings='6-7'/>
<cpu id='7' socket_id='3' core_id='0' siblings='6-7'/>
</cpus>
</cell>
</cells>
</topology>
vcpu_pin_set = 1,3,4,5,6,7 in nova.conf
In the nova database, host topology looks the following way (including
only relevant fields):
cell0 -- "cpuset": [1, 3, 4, 5], "pinned_cpus": [], "siblings": [[4, 5]]
cell1 -- "cpuset": [6, 7], "pinned_cpus": [], "siblings": [[6, 7]]
It is caused by the fact that during fitting the instance to host cell
we consider avail_cpus, but not free_siblings, so when asking for 4
vcpus, we get to cell0, as there are 4 available. But the compute adds
vcpu-pcpu mapping only for two available siblings, and when accessing
the third one key error happens.
Also we might need to add more info to the docs about the siblings,
and what to include in vcpu_pin_set, so that people don't misconfigure
things.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1776244/+subscriptions
Follow ups