← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1879878] Re: VM become Error after confirming resize with Error info CPUUnpinningInvalid on source node

 

** Changed in: nova/train
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1879878

Title:
  VM become Error after confirming resize with Error info
  CPUUnpinningInvalid on source node

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Fix Released
Status in OpenStack Compute (nova) ussuri series:
  Fix Released

Bug description:
  Description
  ===========

  In my environmet, it will take some time to clean VM on source node in confirming resize.
  during confirming resize process, periodic_task update_available_resource may update resource usage at the same time.
  It may cause ERROR like: 
  CPUUnpinningInvalid: CPU set to unpin [1, 2, 18, 17] must be a subset of pinned CPU set []
  during confirming resize process.

  
   

  Steps to reproduce
  ==================
  * Set /etc/nova/nova.conf "update_resources_interval" to small value, let's say 30 seconds on compute nodes. This step will increase the probability of error.

  * create a "dedicated" VM, the flavor can be
  +----------------------------+--------------------------------------+
  | Property                   | Value                                |
  +----------------------------+--------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                    |
  | disk                       | 80                                   |
  | extra_specs                | {"hw:cpu_policy": "dedicated"}       |
  | id                         | 2be0f830-c215-4018-a96a-bee3e60b5eb1 |
  | name                       | 4vcpu.4mem.80ssd.0eph.numa           |
  | os-flavor-access:is_public | True                                 |
  | ram                        | 4096                                 |
  | rxtx_factor                | 1.0                                  |
  | swap                       |                                      |
  | vcpus                      | 4                                    |
  +----------------------------+--------------------------------------+

  * Resize the VM with a new flavor to another node.

  * Confirm resize. 
  Make sure it will take some time to undefine the vm on source node, 30 seconds will lead to inevitable results.  

  * Then you will see the ERROR notice on dashboard, And the VM become
  ERROR

  
  Expected result
  ===============
  VM resized successfuly, vm state is active

  
  Actual result
  =============

  * VM become ERROR

  * On dashboard you can see this notice:
  Please try again later [Error: CPU set to unpin [1, 2, 18, 17] must be a subset of pinned CPU set []].


  Environment
  ===========
  1. Exact version of OpenStack you are running.

    Newton version with patch https://review.opendev.org/#/c/641806/21
    I am sure it will happen to other new vision with https://review.opendev.org/#/c/641806/21
    such as Train and Ussuri

  2. Which hypervisor did you use?
     Libvirt + KVM

  3. Which storage type did you use?
     local disk

  4. Which networking type did you use?
     Neutron with OpenVSwitch

  Logs & Configs
  ==============

  ERROR log on source node
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [req-364606bb-9fa6-41db-a20e-6df9ff779934 b0887a73f3c1441686bf78944ee284d0 95262f1f45f14170b91cd8054bb36512 - - -] [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] Setting instance vm_state to ERROR
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] Traceback (most recent call last):
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6661, in _error_out_instance_on_exception
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     yield
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3444, in _confirm_resize
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     prefix='old_')
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     return f(*args, **kwargs)
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 379, in drop_move_claim
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     self._update_usage(usage, sign=-1)
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 724, in _update_usage
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     self.compute_node, usage, free)
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1542, in get_host_numa_usage_from_instance
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     host_numa_topology, instance_numa_topology, free=free))
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1409, in numa_usage_from_instances
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     newcell.unpin_cpus(pinned_cpus)
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]   File "/usr/lib/python2.7/site-packages/nova/objects/numa.py", line 95, in unpin_cpus
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]     pinned=list(self.pinned_cpus))
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] CPUUnpinningInvalid: CPU set to unpin [1, 2, 18, 17] must be a subset of pinned CPU set []
  2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1879878/+subscriptions



References