yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88296
[Bug 1961188] [NEW] confirm resize fails with CPUUnpinningInvalid when resizing to the same host
Public bug reported:
This is very similar to https://bugs.launchpad.net/nova/+bug/1944759
(which should be fixed already) but still happens when resizing to the
same host.
reproduction:
fresh single node devstack/master (Nova commit
b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)
in nova-cpu.conf I set (have 4 vcpus in my devstack VM)
[DEFAULT]
allow_resize_to_same_host = True # already set by default on a single node devstack
update_resources_interval = 20 # to increase chances of a race
[compute]
cpu_shared_set = 0
cpu_dedicated_set = 1-3
create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and
confirming) a cirros-based instance between them back and forth.
Some times the resize confirm fails with
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by "nova.compute.manager.ComputeManager.confirm_resize.<lo
cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 {{(pid=136855) do_confirm_resize /opt/
stack/nova/nova/compute/manager.py:4287}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.8
/dist-packages/oslo_concurrency/lockutils.py:294}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance {{(pid=136855) _get_instanc
e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr /opt/st
ack/nova/nova/objects/instance.py:1099}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. {{(pid=136855) _get_preexisting
_port_ids /opt/stack/nova/nova/network/neutron.py:3300}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with network_info: [] {{(pid=136855) up
date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.
8/dist-packages/oslo_concurrency/lockutils.py:312}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr
/opt/stack/nova/nova/objects/instance.py:1099}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source"
:: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" "released" by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
" :: held 0.037s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host master-dsvm. Resource allocatio
ns in the placement service will be removed regardless because the instance is now on the destination host master-dsvm. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subs
et of pinned CPU set [2, 3]
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call last):
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._confirm_resize(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self.rt.drop_move_claim_at_source(context, instance, migration)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] return f(*args, **kwargs)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in drop_move_claim_at_source
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._drop_move_claim(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in _drop_move_claim
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._update_usage(usage, nodename, sign=-1)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] cn.numa_topology = hardware.numa_usage_from_instance_numa(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/virt/hardware.py", line 2476, in numa_usage_from_instance_numa
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] new_cell.unpin_cpus(pinned_cpus)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] raise exception.CPUUnpinningInvalid(requested=list(cpus),
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of pinned CPU set [2, 3]
full log snippet is at
https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1961188
Title:
confirm resize fails with CPUUnpinningInvalid when resizing to the
same host
Status in OpenStack Compute (nova):
New
Bug description:
This is very similar to https://bugs.launchpad.net/nova/+bug/1944759
(which should be fixed already) but still happens when resizing to the
same host.
reproduction:
fresh single node devstack/master (Nova commit
b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)
in nova-cpu.conf I set (have 4 vcpus in my devstack VM)
[DEFAULT]
allow_resize_to_same_host = True # already set by default on a single node devstack
update_resources_interval = 20 # to increase chances of a race
[compute]
cpu_shared_set = 0
cpu_dedicated_set = 1-3
create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and
confirming) a cirros-based instance between them back and forth.
Some times the resize confirm fails with
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by "nova.compute.manager.ComputeManager.confirm_resize.<lo
cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 {{(pid=136855) do_confirm_resize /opt/
stack/nova/nova/compute/manager.py:4287}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.8
/dist-packages/oslo_concurrency/lockutils.py:294}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance {{(pid=136855) _get_instanc
e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr /opt/st
ack/nova/nova/objects/instance.py:1099}}
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. {{(pid=136855) _get_preexisting
_port_ids /opt/stack/nova/nova/network/neutron.py:3300}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with network_info: [] {{(pid=136855) up
date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.
8/dist-packages/oslo_concurrency/lockutils.py:312}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr
/opt/stack/nova/nova/objects/instance.py:1099}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source"
:: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" "released" by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
" :: held 0.037s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host master-dsvm. Resource allocatio
ns in the placement service will be removed regardless because the instance is now on the destination host master-dsvm. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subs
et of pinned CPU set [2, 3]
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call last):
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._confirm_resize(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self.rt.drop_move_claim_at_source(context, instance, migration)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] return f(*args, **kwargs)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in drop_move_claim_at_source
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._drop_move_claim(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in _drop_move_claim
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] self._update_usage(usage, nodename, sign=-1)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] cn.numa_topology = hardware.numa_usage_from_instance_numa(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/virt/hardware.py", line 2476, in numa_usage_from_instance_numa
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] new_cell.unpin_cpus(pinned_cpus)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] File "/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] raise exception.CPUUnpinningInvalid(requested=list(cpus),
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of pinned CPU set [2, 3]
full log snippet is at
https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1961188/+subscriptions