yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #84530
[Bug 1879878] Re: VM become Error after confirming resize with Error info CPUUnpinningInvalid on source node
** Changed in: nova/ussuri
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1879878
Title:
VM become Error after confirming resize with Error info
CPUUnpinningInvalid on source node
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) train series:
In Progress
Status in OpenStack Compute (nova) ussuri series:
Fix Released
Bug description:
Description
===========
In my environmet, it will take some time to clean VM on source node in confirming resize.
during confirming resize process, periodic_task update_available_resource may update resource usage at the same time.
It may cause ERROR like:
CPUUnpinningInvalid: CPU set to unpin [1, 2, 18, 17] must be a subset of pinned CPU set []
during confirming resize process.
Steps to reproduce
==================
* Set /etc/nova/nova.conf "update_resources_interval" to small value, let's say 30 seconds on compute nodes. This step will increase the probability of error.
* create a "dedicated" VM, the flavor can be
+----------------------------+--------------------------------------+
| Property | Value |
+----------------------------+--------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 80 |
| extra_specs | {"hw:cpu_policy": "dedicated"} |
| id | 2be0f830-c215-4018-a96a-bee3e60b5eb1 |
| name | 4vcpu.4mem.80ssd.0eph.numa |
| os-flavor-access:is_public | True |
| ram | 4096 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+----------------------------+--------------------------------------+
* Resize the VM with a new flavor to another node.
* Confirm resize.
Make sure it will take some time to undefine the vm on source node, 30 seconds will lead to inevitable results.
* Then you will see the ERROR notice on dashboard, And the VM become
ERROR
Expected result
===============
VM resized successfuly, vm state is active
Actual result
=============
* VM become ERROR
* On dashboard you can see this notice:
Please try again later [Error: CPU set to unpin [1, 2, 18, 17] must be a subset of pinned CPU set []].
Environment
===========
1. Exact version of OpenStack you are running.
Newton version with patch https://review.opendev.org/#/c/641806/21
I am sure it will happen to other new vision with https://review.opendev.org/#/c/641806/21
such as Train and Ussuri
2. Which hypervisor did you use?
Libvirt + KVM
3. Which storage type did you use?
local disk
4. Which networking type did you use?
Neutron with OpenVSwitch
Logs & Configs
==============
ERROR log on source node
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [req-364606bb-9fa6-41db-a20e-6df9ff779934 b0887a73f3c1441686bf78944ee284d0 95262f1f45f14170b91cd8054bb36512 - - -] [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] Setting instance vm_state to ERROR
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] Traceback (most recent call last):
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6661, in _error_out_instance_on_exception
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] yield
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3444, in _confirm_resize
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] prefix='old_')
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] return f(*args, **kwargs)
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 379, in drop_move_claim
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] self._update_usage(usage, sign=-1)
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 724, in _update_usage
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] self.compute_node, usage, free)
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1542, in get_host_numa_usage_from_instance
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] host_numa_topology, instance_numa_topology, free=free))
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1409, in numa_usage_from_instances
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] newcell.unpin_cpus(pinned_cpus)
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] File "/usr/lib/python2.7/site-packages/nova/objects/numa.py", line 95, in unpin_cpus
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] pinned=list(self.pinned_cpus))
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c] CPUUnpinningInvalid: CPU set to unpin [1, 2, 18, 17] must be a subset of pinned CPU set []
2020-05-15 10:11:12.324 425843 ERROR nova.compute.manager [instance: 993138d6-4b80-4b19-81c1-a16dbc6e196c]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1879878/+subscriptions
References