← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2025480] Re: overlapping pinned CPUs after unshelve

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/890406
Committed: https://opendev.org/openstack/nova/commit/f1dc4ec39bcfda1bd4b97e233a9da498b6378c4f
Submitter: "Zuul (22348)"
Branch:    master

commit f1dc4ec39bcfda1bd4b97e233a9da498b6378c4f
Author: Bence Romsics <bence.romsics@xxxxxxxxx>
Date:   Wed Aug 2 16:22:55 2023 +0200

    Do not untrack resources of a server being unshelved
    
    This patch concerns the time when a VM is being unshelved and the
    compute manager set the task_state to spawning, claimed resources of
    the VM and then called driver.spawn(). So the instance is in vm_state
    SHELVED_OFFLOADED, task_state spawning.
    
    If at this point a new update_available_resource periodic job is
    started that collects all the instances assigned to the node to
    calculate resource usage. However the calculation assumed that a
    VM in SHELVED_OFFLOADED state does not need resource allocation on
    the node (probably being removed from the node as it is offloaded)
    and deleted the resource claim.
    
    Given all this we ended up with the VM spawned successfully but having
    lost the resource claim on the node.
    
    This patch changes what we do in vm_state SHELVED_OFFLOADED, task_state
    spawning. We no longer delete the resource claim in this state and
    keep tracking the resource in stats.
    
    Change-Id: I8c9944810c09d501a6d3f60f095d9817b756872d
    Closes-Bug: #2025480


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2025480

Title:
  overlapping pinned CPUs after unshelve

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  It seems that after unshelve, occasionally the request for a dedicated
  CPU is ignored. More precisely the first pinned CPU does not seem to
  be marked as consumed, so the second may end up on the same CPU. This
  was first observed on victoria (6 times out of 46 tries), but then I
  was able to reproduce it on master too (6 times out of 20 tries). The
  logs attached are from the victoria environment, which was a single-
  host all-in-one devstack running only the vms used for this
  reprouduction.

  stable/victoria
  devstack 3eb6e2d7
  nova 1aca09b966

  master
  devstack b10c0602
  nova 2aea80c0af

  config:
  [[post-config|/etc/nova/nova.conf]]
  [DEFAULT]
  scheduler_default_filters = NUMATopologyFilter, ...
  [compute]
  cpu_dedicated_set = 0,1

  Confirming this config in placement:
  $ openstack --os-placement-api-version 1.17 resource provide inventory show 46b3d4de-bb45-4607-8860-040eb2dcd0d7 PCPU
  +------------------+-------+
  | Field            | Value |
  +------------------+-------+
  | allocation_ratio | 1.0   |
  | min_unit         | 1     |
  | max_unit         | 2     |
  | reserved         | 0     |
  | step_size        | 1     |
  | total            | 2     |
  +------------------+-------+

  Reproduction steps:
  openstack flavor create cirros256-pinned --public --vcpus 1 --ram 256 --disk 1 --property hw_rng:allowed=True --property hw:cpu_policy=dedicated

  openstack server list -f value -c ID | xargs -r openstack server
  delete --wait

  openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm0 --wait
  openstack server shelve vm0
  sleep 10 # make sure shelve finished
  openstack server create --flavor cirros256-pinned --image cirros-0.5.1-x86_64-disk --nic net-id=private vm1 --wait
  openstack server shelve vm1
  sleep 10

  openstack server unshelve vm0 ; sleep 15 ; openstack server unshelve vm1 # the amount of sleep could easily be relevant
  watch openstack server list # wait until both go ACTIVE

  # both vms ended up on the same cpu
  $ for vm in $( sudo virsh list --name ) ; do sudo virsh dumpxml $vm | xmlstarlet sel -t -v '//vcpupin/@cpuset' ; echo ; done
  0
  0

  Data collected from the environment where the above reproduction
  triggered the bug:

  $ openstack server list
  +--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
  | ID                                   | Name | Status | Networks                                               | Image                    | Flavor           |
  +--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+
  | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | vm1  | ACTIVE | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd | cirros-0.5.1-x86_64-disk | cirros256-pinned |
  | e30de509-6988-4535-a6f5-520c52fba087 | vm0  | ACTIVE | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368 | cirros-0.5.1-x86_64-disk | cirros256-pinned |
  +--------------------------------------+------+--------+--------------------------------------------------------+--------------------------+------------------+

  $ openstack server show vm0
  +-------------------------------------+-----------------------------------------------------------------+
  | Field                               | Value                                                           |
  +-------------------------------------+-----------------------------------------------------------------+
  | OS-DCF:diskConfig                   | MANUAL                                                          |
  | OS-EXT-AZ:availability_zone         | nova                                                            |
  | OS-EXT-SRV-ATTR:host                | devstack1v                                                      |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v                                                      |
  | OS-EXT-SRV-ATTR:instance_name       | instance-0000001f                                               |
  | OS-EXT-STS:power_state              | Running                                                         |
  | OS-EXT-STS:task_state               | None                                                            |
  | OS-EXT-STS:vm_state                 | active                                                          |
  | OS-SRV-USG:launched_at              | 2023-06-29T10:45:25.000000                                      |
  | OS-SRV-USG:terminated_at            | None                                                            |
  | accessIPv4                          |                                                                 |
  | accessIPv6                          |                                                                 |
  | addresses                           | private=10.0.0.6, fdfb:ab27:b2b2:0:f816:3eff:fe78:d368          |
  | config_drive                        |                                                                 |
  | created                             | 2023-06-29T10:44:42Z                                            |
  | flavor                              | cirros256-pinned (a61fac7f-9d3a-4788-8dd8-b84534ed5ba6)         |
  | hostId                              | a660a4498447edab1528a7c92f7eaccd7aad2c258c896f7820aef9f6        |
  | id                                  | e30de509-6988-4535-a6f5-520c52fba087                            |
  | image                               | cirros-0.5.1-x86_64-disk (aaefde72-6da8-45e3-bc58-d7d1b9333d0e) |
  | key_name                            | None                                                            |
  | name                                | vm0                                                             |
  | progress                            | 0                                                               |
  | project_id                          | 0ad9573f6c57482bbd8202ce43706ca6                                |
  | properties                          |                                                                 |
  | security_groups                     | name='default'                                                  |
  | status                              | ACTIVE                                                          |
  | updated                             | 2023-06-29T10:45:26Z                                            |
  | user_id                             | 971105d66de542588ddccdef15594fab                                |
  | volumes_attached                    |                                                                 |
  +-------------------------------------+-----------------------------------------------------------------+

  $ openstack server show vm1
  +-------------------------------------+-----------------------------------------------------------------+
  | Field                               | Value                                                           |
  +-------------------------------------+-----------------------------------------------------------------+
  | OS-DCF:diskConfig                   | MANUAL                                                          |
  | OS-EXT-AZ:availability_zone         | nova                                                            |
  | OS-EXT-SRV-ATTR:host                | devstack1v                                                      |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | devstack1v                                                      |
  | OS-EXT-SRV-ATTR:instance_name       | instance-00000020                                               |
  | OS-EXT-STS:power_state              | Running                                                         |
  | OS-EXT-STS:task_state               | None                                                            |
  | OS-EXT-STS:vm_state                 | active                                                          |
  | OS-SRV-USG:launched_at              | 2023-06-29T10:45:45.000000                                      |
  | OS-SRV-USG:terminated_at            | None                                                            |
  | accessIPv4                          |                                                                 |
  | accessIPv6                          |                                                                 |
  | addresses                           | private=10.0.0.27, fdfb:ab27:b2b2:0:f816:3eff:fe80:2fd          |
  | config_drive                        |                                                                 |
  | created                             | 2023-06-29T10:45:00Z                                            |
  | flavor                              | cirros256-pinned (a61fac7f-9d3a-4788-8dd8-b84534ed5ba6)         |
  | hostId                              | a660a4498447edab1528a7c92f7eaccd7aad2c258c896f7820aef9f6        |
  | id                                  | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc                            |
  | image                               | cirros-0.5.1-x86_64-disk (aaefde72-6da8-45e3-bc58-d7d1b9333d0e) |
  | key_name                            | None                                                            |
  | name                                | vm1                                                             |
  | progress                            | 0                                                               |
  | project_id                          | 0ad9573f6c57482bbd8202ce43706ca6                                |
  | properties                          |                                                                 |
  | security_groups                     | name='default'                                                  |
  | status                              | ACTIVE                                                          |
  | updated                             | 2023-06-29T10:45:45Z                                            |
  | user_id                             | 971105d66de542588ddccdef15594fab                                |
  | volumes_attached                    |                                                                 |
  +-------------------------------------+-----------------------------------------------------------------+

  $ openstack flavor show cirros256-pinned
  +----------------------------+--------------------------------------------------+
  | Field                      | Value                                            |
  +----------------------------+--------------------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                            |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                                |
  | access_project_ids         | None                                             |
  | disk                       | 1                                                |
  | id                         | a61fac7f-9d3a-4788-8dd8-b84534ed5ba6             |
  | name                       | cirros256-pinned                                 |
  | os-flavor-access:is_public | True                                             |
  | properties                 | hw:cpu_policy='dedicated', hw_rng:allowed='True' |
  | ram                        | 256                                              |
  | rxtx_factor                | 1.0                                              |
  | swap                       |                                                  |
  | vcpus                      | 1                                                |
  +----------------------------+--------------------------------------------------+

  $ openstack server event list vm0
  +------------------------------------------+--------------------------------------+----------+----------------------------+
  | Request ID                               | Server ID                            | Action   | Start Time                 |
  +------------------------------------------+--------------------------------------+----------+----------------------------+
  | req-e09149a8-6ef9-49c3-a757-57bbc14d6004 | e30de509-6988-4535-a6f5-520c52fba087 | unshelve | 2023-06-29T10:45:19.000000 |
  | req-4628f975-a57b-468e-a41c-0ef405df4bae | e30de509-6988-4535-a6f5-520c52fba087 | shelve   | 2023-06-29T10:44:49.000000 |
  | req-d7595149-3cc6-472c-9da8-7a40c020a620 | e30de509-6988-4535-a6f5-520c52fba087 | create   | 2023-06-29T10:44:42.000000 |
  +------------------------------------------+--------------------------------------+----------+----------------------------+
  $ openstack server event show vm0 req-e09149a8-6ef9-49c3-a757-57bbc14d6004
  +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field         | Value                                                                                                                                                                                                                      |
  +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | action        | unshelve                                                                                                                                                                                                                   |
  | events        | [{'event': 'compute_unshelve_instance', 'start_time': '2023-06-29T10:45:19.000000', 'finish_time': '2023-06-29T10:45:26.000000', 'result': 'Success', 'traceback': None}, {'event': 'schedule_instances', 'start_time':    |
  |               | '2023-06-29T10:45:19.000000', 'finish_time': '2023-06-29T10:45:19.000000', 'result': 'Success', 'traceback': None}, {'event': 'get_image_info', 'start_time': '2023-06-29T10:45:19.000000', 'finish_time':                 |
  |               | '2023-06-29T10:45:19.000000', 'result': 'Success', 'traceback': None}]                                                                                                                                                     |
  | instance_uuid | e30de509-6988-4535-a6f5-520c52fba087                                                                                                                                                                                       |
  | message       | None                                                                                                                                                                                                                       |
  | project_id    | 0ad9573f6c57482bbd8202ce43706ca6                                                                                                                                                                                           |
  | request_id    | req-e09149a8-6ef9-49c3-a757-57bbc14d6004                                                                                                                                                                                   |
  | start_time    | 2023-06-29T10:45:19.000000                                                                                                                                                                                                 |
  | user_id       | 971105d66de542588ddccdef15594fab                                                                                                                                                                                           |
  +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  $ openstack server event show vm0 req-4628f975-a57b-468e-a41c-0ef405df4bae
  +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field         | Value                                                                                                                                                                   |
  +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | action        | shelve                                                                                                                                                                  |
  | events        | [{'event': 'compute_shelve_instance', 'start_time': '2023-06-29T10:44:49.000000', 'finish_time': '2023-06-29T10:44:55.000000', 'result': 'Success', 'traceback': None}] |
  | instance_uuid | e30de509-6988-4535-a6f5-520c52fba087                                                                                                                                    |
  | message       | None                                                                                                                                                                    |
  | project_id    | 0ad9573f6c57482bbd8202ce43706ca6                                                                                                                                        |
  | request_id    | req-4628f975-a57b-468e-a41c-0ef405df4bae                                                                                                                                |
  | start_time    | 2023-06-29T10:44:49.000000                                                                                                                                              |
  | user_id       | 971105d66de542588ddccdef15594fab                                                                                                                                        |
  +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

  $ openstack server event list vm1
  +------------------------------------------+--------------------------------------+----------+----------------------------+
  | Request ID                               | Server ID                            | Action   | Start Time                 |
  +------------------------------------------+--------------------------------------+----------+----------------------------+
  | req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | unshelve | 2023-06-29T10:45:35.000000 |
  | req-1dc9b980-d765-415a-b946-2daf53f361b1 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | shelve   | 2023-06-29T10:45:07.000000 |
  | req-5071fd9f-00e2-44f8-9b6c-8715fe918e98 | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc | create   | 2023-06-29T10:45:00.000000 |
  +------------------------------------------+--------------------------------------+----------+----------------------------+
  $ openstack server event show vm1 req-9ab0eabc-bf97-48bf-b534-dd22df3c6452
  +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field         | Value                                                                                                                                                                                                                      |
  +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | action        | unshelve                                                                                                                                                                                                                   |
  | events        | [{'event': 'compute_unshelve_instance', 'start_time': '2023-06-29T10:45:36.000000', 'finish_time': '2023-06-29T10:45:45.000000', 'result': 'Success', 'traceback': None}, {'event': 'schedule_instances', 'start_time':    |
  |               | '2023-06-29T10:45:35.000000', 'finish_time': '2023-06-29T10:45:36.000000', 'result': 'Success', 'traceback': None}, {'event': 'get_image_info', 'start_time': '2023-06-29T10:45:35.000000', 'finish_time':                 |
  |               | '2023-06-29T10:45:35.000000', 'result': 'Success', 'traceback': None}]                                                                                                                                                     |
  | instance_uuid | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc                                                                                                                                                                                       |
  | message       | None                                                                                                                                                                                                                       |
  | project_id    | 0ad9573f6c57482bbd8202ce43706ca6                                                                                                                                                                                           |
  | request_id    | req-9ab0eabc-bf97-48bf-b534-dd22df3c6452                                                                                                                                                                                   |
  | start_time    | 2023-06-29T10:45:35.000000                                                                                                                                                                                                 |
  | user_id       | 971105d66de542588ddccdef15594fab                                                                                                                                                                                           |
  +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  $ openstack server event show vm1 req-1dc9b980-d765-415a-b946-2daf53f361b1
  +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field         | Value                                                                                                                                                                   |
  +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | action        | shelve                                                                                                                                                                  |
  | events        | [{'event': 'compute_shelve_instance', 'start_time': '2023-06-29T10:45:08.000000', 'finish_time': '2023-06-29T10:45:13.000000', 'result': 'Success', 'traceback': None}] |
  | instance_uuid | 4734b8a5-a6dd-432a-86c9-ba0367bb86cc                                                                                                                                    |
  | message       | None                                                                                                                                                                    |
  | project_id    | 0ad9573f6c57482bbd8202ce43706ca6                                                                                                                                        |
  | request_id    | req-1dc9b980-d765-415a-b946-2daf53f361b1                                                                                                                                |
  | start_time    | 2023-06-29T10:45:07.000000                                                                                                                                              |
  | user_id       | 971105d66de542588ddccdef15594fab                                                                                                                                        |
  +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

  Logs that seemed relevant to me:
  vm0 unshelve:
  jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=7957,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([0,1]),pinned_cpus=set([]),siblings=[set([0]),set([1])]) {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1046}}
  jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Pinning has been requested {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1133}}
  jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Packing an instance onto a set of siblings:     host_cell_free_siblings: [{0}, {1}]    instance_cell: InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0]))    host_cell_id: 0    threads_per_core: 1    num_cpu_reserved: 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:731}}
  jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Built sibling_sets: defaultdict(<class 'list'>, {1: [{0}, {1}]}) {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:752}}
  jún 29 10:45:19 devstack1v nova-compute[379150]: INFO nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1]], vCPUs mapping: [(0, 0)]
  jún 29 10:45:19 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] Selected cores for pinning: [(0, 0)], in cell 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:1013}}
  jún 29 10:45:19 devstack1v nova-compute[379150]: INFO nova.compute.claims [None req-e09149a8-6ef9-49c3-a757-57bbc14d6004 admin admin] [instance: e30de509-6988-4535-a6f5-520c52fba087] Claim successful on node devstack1v

  vm1 unshelve:
  jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Attempting to fit instance cell InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0])) on host_cell NUMACell(cpu_usage=0,cpuset=set([]),id=0,memory=7957,memory_usage=0,mempages=[NUMAPagesTopology,NUMAPagesTopology],network_metadata=NetworkMetadata,pcpuset=set([0,1]),pinned_cpus=set([]),siblings=[set([0]),set([1])]) {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1046}}
  jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Pinning has been requested {{(pid=379150) _numa_fit_instance_cell /opt/stack/nova/nova/virt/hardware.py:1133}}
  jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Packing an instance onto a set of siblings:     host_cell_free_siblings: [{0}, {1}]    instance_cell: InstanceNUMACell(cpu_pinning_raw={0=0},cpu_policy='dedicated',cpu_thread_policy=None,cpu_topology=VirtCPUTopology,cpuset=set([]),cpuset_reserved=None,id=0,memory=256,pagesize=None,pcpuset=set([0]))    host_cell_id: 0    threads_per_core: 1    num_cpu_reserved: 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:731}}
  jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Built sibling_sets: defaultdict(<class 'list'>, {1: [{0}, {1}]}) {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:752}}
  jún 29 10:45:37 devstack1v nova-compute[379150]: INFO nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1]], vCPUs mapping: [(0, 0)]
  jún 29 10:45:37 devstack1v nova-compute[379150]: DEBUG nova.virt.hardware [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] Selected cores for pinning: [(0, 0)], in cell 0 {{(pid=379150) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:1013}}
  jún 29 10:45:37 devstack1v nova-compute[379150]: INFO nova.compute.claims [None req-9ab0eabc-bf97-48bf-b534-dd22df3c6452 admin admin] [instance: 4734b8a5-a6dd-432a-86c9-ba0367bb86cc] Claim successful on node devstack1v

  The first error in the logs came not during unshelve but well after:
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Error updating resources for node devstack1v.: nova.exception.CPUPinningInvalid: CPU set to pin [0] must
   be a subset of free CPU set [1]
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager Traceback (most recent call last):
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 9973, in _update_available_resource_for_node
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     self.rt.update_available_resource(context, nodename,
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     self._update_available_resource(context, resources, startup=startup)
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 360, in inner
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     return f(*args, **kwargs)
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 968, in _update_available_resource
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     instance_by_uuid = self._update_usage_from_instances(
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1534, in _update_usage_from_instances
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     self._update_usage_from_instance(context, instance, nodename)
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1497, in _update_usage_from_instance
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     self._update_usage(self._get_usage_dict(instance, instance),
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1301, in _update_usage
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     cn.numa_topology = hardware.numa_usage_from_instance_numa(
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/virt/hardware.py", line 2450, in numa_usage_from_instance_numa
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     new_cell.pin_cpus(pinned_cpus)
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager   File "/opt/stack/nova/nova/objects/numa.py", line 91, in pin_cpus
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager     raise exception.CPUPinningInvalid(requested=list(cpus),
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager nova.exception.CPUPinningInvalid: CPU set to pin [0] must be a subset of free CPU set [1]
  jún 29 10:47:27 devstack1v nova-compute[379150]: ERROR nova.compute.manager
  jún 29 10:47:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager._instance_usage_audit {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}

  Between the two unshelves the update_available_resource task ran once:
  $ egrep 'Running periodic task ComputeManager.update_available_resource' ~/n-cpu.log
  jún 29 10:35:16 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:36:17 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:37:17 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:38:19 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:39:20 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:40:22 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:41:22 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:42:23 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:43:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:44:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:45:24 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:46:26 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:47:27 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:48:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}
  jún 29 10:49:28 devstack1v nova-compute[379150]: DEBUG oslo_service.periodic_task [None req-a52cbcd3-ad79-457e-a0e1-3ec225226411 None None] Running periodic task ComputeManager.update_available_resource {{(pid=379150) run_periodic_tasks /usr/local/lib/python3.8/dist-packages/oslo_service/periodic_task.py:210}}

  I also attached a .tar.gz with the following contents:

  for service in n-api n-cond-cell1 n-cpu n-sch n-super-cond
  do
      sudo journalctl -u devstack@$service -S '2023-06-29 10:35:00' -U '2023-06-29 10:50:00' > ~/$service.log
  done

  $ sudo virsh list
   Id   Name                State
  -----------------------------------
   63   instance-0000001f   running
   64   instance-00000020   running

  sudo virsh dumpxml instance-0000001f > ~/instance-0000001f.xml
  sudo virsh dumpxml instance-00000020 > ~/instance-00000020.xml

  Let me know if I can collect more information.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2025480/+subscriptions



References