yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92562
[Bug 2025480] [NEW] overlapping pinned CPUs after unshelve
Public bug reported:
It seems that after unshelve, occasionally the request for a dedicated
CPU is ignored. More precisely the first pinned CPU does not seem to be
marked as consumed, so the second may end up on the same CPU. This was
first observed on victoria (6 times out of 46 tries), but then I was
able to reproduce it on master too (6 times out of 20 tries). The logs
attached are from the victoria environment, which was a single-host all-
in-one devstack running only the vms used for this reprouduction.
stable/victoria
devstack 3eb6e2d7
nova 1aca09b966
master
devstack b10c0602
nova 2aea80c0af
config:
[[post-config|/etc/nova/nova.conf]]
[DEFAULT]
scheduler_default_filters = NUMATopologyFilter, ...
[compute]
cpu_dedicated_set = 0,1
Confirming this config in placement:
$ openstack --os-placement-api-version 1.17 resource provide inventory show 46b3d4de-bb45-4607-8860-040eb2dcd0d7 PCPU
+------------------+-------+
| Field | Value |
+------------------+-------+
| allocation_ratio | 1.0 |
| min_unit | 1 |
| max_unit | 2 |
| reserved | 0 |
| step_size | 1 |
| total | 2 |
+------------------+-------+
Reproduction steps:
openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated
openstack server list -f value -c ID | xargs -r openstack server delete
--wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm0 --wait
openstack server shelve vm0
sleep 10 # make sure shelve finished
openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm1 --wait
openstack server shelve vm1
sleep 10
openstack server unshelve vm0 ; sleep 15 ; openstack server unshelve vm1 # the amount of sleep could easily be relevant
watch openstack server list # wait until both go ACTIVE
# both vms ended up on the same cpu
$ for vm in $( sudo virsh list --name ) ; do sudo virsh dumpxml $vm | xmlstarlet sel -t -v '//vcpupin/@cpuset' ; echo ; done
0
0
Data collected from the environment where the above reproduction
triggered the bug:
$ openstack server list
+--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
| 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | vm1 | ACTIVE | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd | cirros-0.5.1-x86_64-disk | cirros256-pinned |
| e30de509-6988-4535-a6f5-520c52fba087 | vm0 | ACTIVE | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 | cirros-0.5.1-x86_64-disk | cirros256-pinned |
+--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
$ openstack server show vm0
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | devstack1v |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v |
| OS-EXT-SRV-ATTR:instance_name | instance-0000001f |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2023-06-29T10:45:25.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 |
| config_drive | |
| created | 2023-06-29T10:44:42Z |
| flavor | cirros256-pinned (a61fac7f-9d3a-4788-8dd8-b84534ed5ba6) |
| hostId | a660a4498447edab1528a7c92f7eaccd7aad2c258c896f7820aef9f6 |
| id | e30de509-6988-4535-a6f5-520c52fba087 |
| image | cirros-0.5.1-x86_64-disk (aaefde72-6da8-45e3-bc58-d7d1b9333d0e) |
| key_name | None |
| name | vm0 |
| progress | 0 |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| properties | |
| security_groups | name='default' |
| status | ACTIVE |
| updated | 2023-06-29T10:45:26Z |
| user_id | 971105d66de542588ddccdef15594fab |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
$ openstack server show vm1
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | devstack1v |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v |
| OS-EXT-SRV-ATTR:instance_name | instance-00000020 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2023-06-29T10:45:45.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd |
| config_drive | |
| created | 2023-06-29T10:45:00Z |
| flavor | cirros256-pinned (a61fac7f-9d3a-4788-8dd8-b84534ed5ba6) |
| hostId | a660a4498447edab1528a7c92f7eaccd7aad2c258c896f7820aef9f6 |
| id | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc |
| image | cirros-0.5.1-x86_64-disk (aaefde72-6da8-45e3-bc58-d7d1b9333d0e) |
| key_name | None |
| name | vm1 |
| progress | 0 |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| properties | |
| security_groups | name='default' |
| status | ACTIVE |
| updated | 2023-06-29T10:45:45Z |
| user_id | 971105d66de542588ddccdef15594fab |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
$ openstack flavor show cirros256-pinned
+----------------------------+--------------------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 1 |
| id | a61fac7f-9d3a-4788-8dd8-b84534ed5ba6 |
| name | cirros256-pinned |
| os-flavor-access:is_public | True |
| properties | hw:cpu_policy='dedicated', hw_rng:allowed='True' |
| ram | 256 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+--------------------------------------------------+
$ openstack server event list vm0
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-e09149a8-6ef9-49c3-a757-57bbc14d6004 | e30de509-6988-4535-a6f5-520c52fba087 | unshelve | 2023-06-29T10:45:19.000000 |
| req-4628f975-a57b-468e-a41c-0ef405df4bae | e30de509-6988-4535-a6f5-520c52fba087 | shelve | 2023-06-29T10:44:49.000000 |
| req-d7595149-3cc6-472c-9da8-7a40c020a620 | e30de509-6988-4535-a6f5-520c52fba087 | create | 2023-06-29T10:44:42.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
$ openstack server event show vm0 req-e09149a8-6ef9-49c3-a757-57bbc14d6004
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | unshelve |
| events | [{'event': 'compute_unshelve_instance', 'start_time': '2023-06-29T10:45:19.000000', 'finish_time': '2023-06-29T10:45:26.000000', 'result': 'Success', 'traceback': None}, {'event': 'schedule_instances', 'start_time': |
| | '2023-06-29T10:45:19.000000', 'finish_time': '2023-06-29T10:45:19.000000', 'result': 'Success', 'traceback': None}, {'event': 'get_image_info', 'start_time': '2023-06-29T10:45:19.000000', 'finish_time': |
| | '2023-06-29T10:45:19.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | e30de509-6988-4535-a6f5-520c52fba087 |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-e09149a8-6ef9-49c3-a757-57bbc14d6004 |
| start_time | 2023-06-29T10:45:19.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack server event show vm0 req-4628f975-a57b-468e-a41c-0ef405df4bae
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | shelve |
| events | [{'event': 'compute_shelve_instance', 'start_time': '2023-06-29T10:44:49.000000', 'finish_time': '2023-06-29T10:44:55.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | e30de509-6988-4535-a6f5-520c52fba087 |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-4628f975-a57b-468e-a41c-0ef405df4bae |
| start_time | 2023-06-29T10:44:49.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack server event list vm1
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | unshelve | 2023-06-29T10:45:35.000000 |
| req-1dc9b980-d765-415a-b946-2daf53f361b1 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | shelve | 2023-06-29T10:45:07.000000 |
| req-5071fd9f-00e2-44f8-9b6c-8715fe918e98 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | create | 2023-06-29T10:45:00.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
$ openstack server event show vm1 req-9ab0eabc-bf97-48bf-b534-dd22df3c6452
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | unshelve |
| events | [{'event': 'compute_unshelve_instance', 'start_time': '2023-06-29T10:45:36.000000', 'finish_time': '2023-06-29T10:45:45.000000', 'result': 'Success', 'traceback': None}, {'event': 'schedule_instances', 'start_time': |
| | '2023-06-29T10:45:35.000000', 'finish_time': '2023-06-29T10:45:36.000000', 'result': 'Success', 'traceback': None}, {'event': 'get_image_info', 'start_time': '2023-06-29T10:45:35.000000', 'finish_time': |
| | '2023-06-29T10:45:35.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 |
| start_time | 2023-06-29T10:45:35.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack server event show vm1 req-1dc9b980-d765-415a-b946-2daf53f361b1
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | shelve |
| events | [{'event': 'compute_shelve_instance', 'start_time': '2023-06-29T10:45:08.000000', 'finish_time': '2023-06-29T10:45:13.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-1dc9b980-d765-415a-b946-2daf53f361b1 |
| start_time | 2023-06-29T10:45:07.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Logs that seemed relevant to me:
vm0 unshelve:
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=7957,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([0,1]),pinned_cpus=set([]),siblings=[set([0]),set([1])]) {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1046}}
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Pinning has been requested {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1133}}
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Packing an instance onto a set of siblings: host_cell_free_siblings: [{0}, {1}] instance_cell: InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) host_cell_id: 0 threads_per_core: 1 num_cpu_reserved: 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:731}}
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Built sibling_sets: defaultdict(<class 'list'>, {1: [{0}, {1}]}) {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:752}}
jún 29 10:45:19 devstack1v nova-compute[379150]: INFO nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1]], vCPUs mapping: [(0, 0)]
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Selected cores for pinning: [(0, 0)], in cell 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:1013}}
jún 29 10:45:19 devstack1v nova-compute[379150]: INFO nova.compute.claims [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] [instance: e30de509-6988-4535-a6f5-520c52fba087] Claim successful on node devstack1v
vm1 unshelve:
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=7957,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([0,1]),pinned_cpus=set([]),siblings=[set([0]),set([1])]) {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1046}}
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Pinning has been requested {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1133}}
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Packing an instance onto a set of siblings: host_cell_free_siblings: [{0}, {1}] instance_cell: InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) host_cell_id: 0 threads_per_core: 1 num_cpu_reserved: 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:731}}
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Built sibling_sets: defaultdict(<class 'list'>, {1: [{0}, {1}]}) {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:752}}
jún 29 10:45:37 devstack1v nova-compute[379150]: INFO nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1]], vCPUs mapping: [(0, 0)]
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Selected cores for pinning: [(0, 0)], in cell 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:1013}}
jún 29 10:45:37 devstack1v nova-compute[379150]: INFO nova.compute.claims [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] [instance: 4734b8a5-a6dd-432a-86c9-ba0367bb86cc] Claim successful on node devstack1v
The first error in the logs came not during unshelve but well after:
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Error updating resources for node devstack1v.: nova.exception.CPUPinningInvalid: CPU set to pin [0] must
be a subset of free CPU set [1]
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager Traceback (most recent call last):
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9973, in _update_available_resource_for_node
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self.rt.update_available_resource(context, nodename,
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 360, in inner
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager return f(*args, **kwargs)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 968, in _update_available_resource
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager instance_by_uuid = self._update_usage_from_instances(
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1534, in _update_usage_from_instances
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self._update_usage_from_instance(context, instance, nodename)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1497, in _update_usage_from_instance
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self._update_usage(self._get_usage_dict(instance, instance),
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1301, in _update_usage
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager cn.numa_topology = hardware.numa_usage_from_instance_numa(
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/hardware.py", line 2450, in numa_usage_from_instance_numa
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager new_cell.pin_cpus(pinned_cpus)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/objects/numa.py", line 91, in pin_cpus
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager raise exception.CPUPinningInvalid(requested=list(cpus),
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager nova.exception.CPUPinningInvalid: CPU set to pin [0] must be a subset of free CPU set [1]
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager
jún 29 10:47:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager._instance_usage_audit {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
Between the two unshelves the update_available_resource task ran once:
$ egrep 'Running periodic task ComputeManager.update_available_resource' ~/n-cpu.log
jún 29 10:35:16 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:36:17 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:37:17 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:38:19 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:39:20 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:40:22 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:41:22 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:42:23 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:43:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:44:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:45:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:46:26 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:47:27 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:48:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:49:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
I also attached a .tar.gz with the following contents:
for service in n-api n-cond-cell1 n-cpu n-sch n-super-cond
do
sudo journalctl -u devstack@$service -S '2023-06-29 10:35:00' -U '2023-06-29 10:50:00' > ~/$service.log
done
$ sudo virsh list
Id Name State
-----------------------------------
63 instance-0000001f running
64 instance-00000020 running
sudo virsh dumpxml instance-0000001f > ~/instance-0000001f.xml
sudo virsh dumpxml instance-00000020 > ~/instance-00000020.xml
Let me know if I can collect more information.
** Affects: nova
Importance: Undecided
Status: New
** Attachment added: "unshelve-overlapping-pinned-cpus.tar.gz"
https://bugs.launchpad.net/bugs/2025480/+attachment/5683100/+files/unshelve-overlapping-pinned-cpus.tar.gz
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2025480
Title:
overlapping pinned CPUs after unshelve
Status in OpenStack Compute (nova):
New
Bug description:
It seems that after unshelve, occasionally the request for a dedicated
CPU is ignored. More precisely the first pinned CPU does not seem to
be marked as consumed, so the second may end up on the same CPU. This
was first observed on victoria (6 times out of 46 tries), but then I
was able to reproduce it on master too (6 times out of 20 tries). The
logs attached are from the victoria environment, which was a single-
host all-in-one devstack running only the vms used for this
reprouduction.
stable/victoria
devstack 3eb6e2d7
nova 1aca09b966
master
devstack b10c0602
nova 2aea80c0af
config:
[[post-config|/etc/nova/nova.conf]]
[DEFAULT]
scheduler_default_filters = NUMATopologyFilter, ...
[compute]
cpu_dedicated_set = 0,1
Confirming this config in placement:
$ openstack --os-placement-api-version 1.17 resource provide inventory show 46b3d4de-bb45-4607-8860-040eb2dcd0d7 PCPU
+------------------+-------+
| Field | Value |
+------------------+-------+
| allocation_ratio | 1.0 |
| min_unit | 1 |
| max_unit | 2 |
| reserved | 0 |
| step_size | 1 |
| total | 2 |
+------------------+-------+
Reproduction steps:
openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated
openstack server list -f value -c ID | xargs -r openstack server
delete --wait
openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm0 --wait
openstack server shelve vm0
sleep 10 # make sure shelve finished
openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm1 --wait
openstack server shelve vm1
sleep 10
openstack server unshelve vm0 ; sleep 15 ; openstack server unshelve vm1 # the amount of sleep could easily be relevant
watch openstack server list # wait until both go ACTIVE
# both vms ended up on the same cpu
$ for vm in $( sudo virsh list --name ) ; do sudo virsh dumpxml $vm | xmlstarlet sel -t -v '//vcpupin/@cpuset' ; echo ; done
0
0
Data collected from the environment where the above reproduction
triggered the bug:
$ openstack server list
+--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
| 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | vm1 | ACTIVE | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd | cirros-0.5.1-x86_64-disk | cirros256-pinned |
| e30de509-6988-4535-a6f5-520c52fba087 | vm0 | ACTIVE | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 | cirros-0.5.1-x86_64-disk | cirros256-pinned |
+--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
$ openstack server show vm0
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | devstack1v |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v |
| OS-EXT-SRV-ATTR:instance_name | instance-0000001f |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2023-06-29T10:45:25.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 |
| config_drive | |
| created | 2023-06-29T10:44:42Z |
| flavor | cirros256-pinned (a61fac7f-9d3a-4788-8dd8-b84534ed5ba6) |
| hostId | a660a4498447edab1528a7c92f7eaccd7aad2c258c896f7820aef9f6 |
| id | e30de509-6988-4535-a6f5-520c52fba087 |
| image | cirros-0.5.1-x86_64-disk (aaefde72-6da8-45e3-bc58-d7d1b9333d0e) |
| key_name | None |
| name | vm0 |
| progress | 0 |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| properties | |
| security_groups | name='default' |
| status | ACTIVE |
| updated | 2023-06-29T10:45:26Z |
| user_id | 971105d66de542588ddccdef15594fab |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
$ openstack server show vm1
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | devstack1v |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v |
| OS-EXT-SRV-ATTR:instance_name | instance-00000020 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2023-06-29T10:45:45.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd |
| config_drive | |
| created | 2023-06-29T10:45:00Z |
| flavor | cirros256-pinned (a61fac7f-9d3a-4788-8dd8-b84534ed5ba6) |
| hostId | a660a4498447edab1528a7c92f7eaccd7aad2c258c896f7820aef9f6 |
| id | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc |
| image | cirros-0.5.1-x86_64-disk (aaefde72-6da8-45e3-bc58-d7d1b9333d0e) |
| key_name | None |
| name | vm1 |
| progress | 0 |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| properties | |
| security_groups | name='default' |
| status | ACTIVE |
| updated | 2023-06-29T10:45:45Z |
| user_id | 971105d66de542588ddccdef15594fab |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
$ openstack flavor show cirros256-pinned
+----------------------------+--------------------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 1 |
| id | a61fac7f-9d3a-4788-8dd8-b84534ed5ba6 |
| name | cirros256-pinned |
| os-flavor-access:is_public | True |
| properties | hw:cpu_policy='dedicated', hw_rng:allowed='True' |
| ram | 256 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+--------------------------------------------------+
$ openstack server event list vm0
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-e09149a8-6ef9-49c3-a757-57bbc14d6004 | e30de509-6988-4535-a6f5-520c52fba087 | unshelve | 2023-06-29T10:45:19.000000 |
| req-4628f975-a57b-468e-a41c-0ef405df4bae | e30de509-6988-4535-a6f5-520c52fba087 | shelve | 2023-06-29T10:44:49.000000 |
| req-d7595149-3cc6-472c-9da8-7a40c020a620 | e30de509-6988-4535-a6f5-520c52fba087 | create | 2023-06-29T10:44:42.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
$ openstack server event show vm0 req-e09149a8-6ef9-49c3-a757-57bbc14d6004
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | unshelve |
| events | [{'event': 'compute_unshelve_instance', 'start_time': '2023-06-29T10:45:19.000000', 'finish_time': '2023-06-29T10:45:26.000000', 'result': 'Success', 'traceback': None}, {'event': 'schedule_instances', 'start_time': |
| | '2023-06-29T10:45:19.000000', 'finish_time': '2023-06-29T10:45:19.000000', 'result': 'Success', 'traceback': None}, {'event': 'get_image_info', 'start_time': '2023-06-29T10:45:19.000000', 'finish_time': |
| | '2023-06-29T10:45:19.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | e30de509-6988-4535-a6f5-520c52fba087 |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-e09149a8-6ef9-49c3-a757-57bbc14d6004 |
| start_time | 2023-06-29T10:45:19.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack server event show vm0 req-4628f975-a57b-468e-a41c-0ef405df4bae
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | shelve |
| events | [{'event': 'compute_shelve_instance', 'start_time': '2023-06-29T10:44:49.000000', 'finish_time': '2023-06-29T10:44:55.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | e30de509-6988-4535-a6f5-520c52fba087 |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-4628f975-a57b-468e-a41c-0ef405df4bae |
| start_time | 2023-06-29T10:44:49.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack server event list vm1
+------------------------------------------+--------------------------------------+----------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------+----------------------------+
| req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | unshelve | 2023-06-29T10:45:35.000000 |
| req-1dc9b980-d765-415a-b946-2daf53f361b1 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | shelve | 2023-06-29T10:45:07.000000 |
| req-5071fd9f-00e2-44f8-9b6c-8715fe918e98 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | create | 2023-06-29T10:45:00.000000 |
+------------------------------------------+--------------------------------------+----------+----------------------------+
$ openstack server event show vm1 req-9ab0eabc-bf97-48bf-b534-dd22df3c6452
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | unshelve |
| events | [{'event': 'compute_unshelve_instance', 'start_time': '2023-06-29T10:45:36.000000', 'finish_time': '2023-06-29T10:45:45.000000', 'result': 'Success', 'traceback': None}, {'event': 'schedule_instances', 'start_time': |
| | '2023-06-29T10:45:35.000000', 'finish_time': '2023-06-29T10:45:36.000000', 'result': 'Success', 'traceback': None}, {'event': 'get_image_info', 'start_time': '2023-06-29T10:45:35.000000', 'finish_time': |
| | '2023-06-29T10:45:35.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 |
| start_time | 2023-06-29T10:45:35.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ openstack server event show vm1 req-1dc9b980-d765-415a-b946-2daf53f361b1
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | shelve |
| events | [{'event': 'compute_shelve_instance', 'start_time': '2023-06-29T10:45:08.000000', 'finish_time': '2023-06-29T10:45:13.000000', 'result': 'Success', 'traceback': None}] |
| instance_uuid | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc |
| message | None |
| project_id | 0ad9573f6c57482bbd8202ce43706ca6 |
| request_id | req-1dc9b980-d765-415a-b946-2daf53f361b1 |
| start_time | 2023-06-29T10:45:07.000000 |
| user_id | 971105d66de542588ddccdef15594fab |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Logs that seemed relevant to me:
vm0 unshelve:
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=7957,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([0,1]),pinned_cpus=set([]),siblings=[set([0]),set([1])]) {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1046}}
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Pinning has been requested {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1133}}
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Packing an instance onto a set of siblings: host_cell_free_siblings: [{0}, {1}] instance_cell: InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) host_cell_id: 0 threads_per_core: 1 num_cpu_reserved: 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:731}}
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Built sibling_sets: defaultdict(<class 'list'>, {1: [{0}, {1}]}) {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:752}}
jún 29 10:45:19 devstack1v nova-compute[379150]: INFO nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1]], vCPUs mapping: [(0, 0)]
jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Selected cores for pinning: [(0, 0)], in cell 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:1013}}
jún 29 10:45:19 devstack1v nova-compute[379150]: INFO nova.compute.claims [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] [instance: e30de509-6988-4535-a6f5-520c52fba087] Claim successful on node devstack1v
vm1 unshelve:
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=7957,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([0,1]),pinned_cpus=set([]),siblings=[set([0]),set([1])]) {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1046}}
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Pinning has been requested {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1133}}
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Packing an instance onto a set of siblings: host_cell_free_siblings: [{0}, {1}] instance_cell: InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) host_cell_id: 0 threads_per_core: 1 num_cpu_reserved: 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:731}}
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Built sibling_sets: defaultdict(<class 'list'>, {1: [{0}, {1}]}) {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:752}}
jún 29 10:45:37 devstack1v nova-compute[379150]: INFO nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1]], vCPUs mapping: [(0, 0)]
jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Selected cores for pinning: [(0, 0)], in cell 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:1013}}
jún 29 10:45:37 devstack1v nova-compute[379150]: INFO nova.compute.claims [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] [instance: 4734b8a5-a6dd-432a-86c9-ba0367bb86cc] Claim successful on node devstack1v
The first error in the logs came not during unshelve but well after:
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Error updating resources for node devstack1v.: nova.exception.CPUPinningInvalid: CPU set to pin [0] must
be a subset of free CPU set [1]
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager Traceback (most recent call last):
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9973, in _update_available_resource_for_node
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self.rt.update_available_resource(context, nodename,
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 360, in inner
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager return f(*args, **kwargs)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 968, in _update_available_resource
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager instance_by_uuid = self._update_usage_from_instances(
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1534, in _update_usage_from_instances
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self._update_usage_from_instance(context, instance, nodename)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1497, in _update_usage_from_instance
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager self._update_usage(self._get_usage_dict(instance, instance),
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1301, in _update_usage
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager cn.numa_topology = hardware.numa_usage_from_instance_numa(
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/hardware.py", line 2450, in numa_usage_from_instance_numa
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager new_cell.pin_cpus(pinned_cpus)
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager File "/opt/stack/nova/nova/objects/numa.py", line 91, in pin_cpus
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager raise exception.CPUPinningInvalid(requested=list(cpus),
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager nova.exception.CPUPinningInvalid: CPU set to pin [0] must be a subset of free CPU set [1]
jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager
jún 29 10:47:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager._instance_usage_audit {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
Between the two unshelves the update_available_resource task ran once:
$ egrep 'Running periodic task ComputeManager.update_available_resource' ~/n-cpu.log
jún 29 10:35:16 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:36:17 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:37:17 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:38:19 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:39:20 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:40:22 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:41:22 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:42:23 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:43:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:44:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:45:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:46:26 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:47:27 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:48:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
jún 29 10:49:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
I also attached a .tar.gz with the following contents:
for service in n-api n-cond-cell1 n-cpu n-sch n-super-cond
do
sudo journalctl -u devstack@$service -S '2023-06-29 10:35:00' -U '2023-06-29 10:50:00' > ~/$service.log
done
$ sudo virsh list
Id Name State
-----------------------------------
63 instance-0000001f running
64 instance-00000020 running
sudo virsh dumpxml instance-0000001f > ~/instance-0000001f.xml
sudo virsh dumpxml instance-00000020 > ~/instance-00000020.xml
Let me know if I can collect more information.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2025480/+subscriptions
Follow ups