← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1961188] [NEW] confirm resize fails with CPUUnpinningInvalid when resizing to the same host

 

Public bug reported:

This is very similar to https://bugs.launchpad.net/nova/+bug/1944759
(which should be fixed already) but still happens when resizing to the
same host.

reproduction:

fresh single node devstack/master (Nova commit
b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)

in nova-cpu.conf I set (have 4 vcpus in my devstack VM)

[DEFAULT]
allow_resize_to_same_host = True # already set by default on a single node devstack
update_resources_interval = 20 # to increase chances of a race

[compute]
cpu_shared_set = 0
cpu_dedicated_set = 1-3

create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and
confirming) a cirros-based instance between them back and forth.

Some times the resize confirm fails with

Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by "nova.compute.manager.ComputeManager.confirm_resize.<lo
cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}                                                                                                                     
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 {{(pid=136855) do_confirm_resize /opt/
stack/nova/nova/compute/manager.py:4287}}                                                                                                                                                                                                                     
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.8
/dist-packages/oslo_concurrency/lockutils.py:294}}                                                                                                                                                                                                            
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance {{(pid=136855) _get_instanc
e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}                                                                                                                                                                                                      
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr /opt/st
ack/nova/nova/objects/instance.py:1099}}                                                                                                                                                                                                                      
Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. {{(pid=136855) _get_preexisting
_port_ids /opt/stack/nova/nova/network/neutron.py:3300}}                                                                                                                                                                                                      
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with network_info: [] {{(pid=136855) up
date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}                                                                                                                                                                                
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.
8/dist-packages/oslo_concurrency/lockutils.py:312}}                                                                                                                                                                                                           
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr 
/opt/stack/nova/nova/objects/instance.py:1099}}                                                                                                                                                                                                               
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source" 
:: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}                                                                                                                                              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" "released" by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
" :: held 0.037s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}                                                                                                                                              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host master-dsvm. Resource allocatio
ns in the placement service will be removed regardless because the instance is now on the destination host master-dsvm. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subs
et of pinned CPU set [2, 3]                                                                                                                                                                                                                                   
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call last):                                                                                              
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize                                               
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._confirm_resize(                                                                                                       
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize                                                 
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self.rt.drop_move_claim_at_source(context, instance, migration)                                                             
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner                               
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     return f(*args, **kwargs)                                                                                                   
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in drop_move_claim_at_source                               
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._drop_move_claim(                                                                                                      
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in _drop_move_claim                                        
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._update_usage(usage, nodename, sign=-1)                                                                                
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage                                          
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     cn.numa_topology = hardware.numa_usage_from_instance_numa(
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/virt/hardware.py", line 2476, in numa_usage_from_instance_numa
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     new_cell.unpin_cpus(pinned_cpus)
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     raise exception.CPUUnpinningInvalid(requested=list(cpus),
Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of pinned CPU set [2, 3]

full log snippet is at
https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1961188

Title:
  confirm resize fails with CPUUnpinningInvalid when resizing to the
  same host

Status in OpenStack Compute (nova):
  New

Bug description:
  This is very similar to https://bugs.launchpad.net/nova/+bug/1944759
  (which should be fixed already) but still happens when resizing to the
  same host.

  reproduction:

  fresh single node devstack/master (Nova commit
  b5029890c1c5b1b5153c9ca2fc9a8ea2437f635d)

  in nova-cpu.conf I set (have 4 vcpus in my devstack VM)

  [DEFAULT]
  allow_resize_to_same_host = True # already set by default on a single node devstack
  update_resources_interval = 20 # to increase chances of a race

  [compute]
  cpu_shared_set = 0
  cpu_dedicated_set = 1-3

  create 2 flavors with 1 and 2 pinned cpu each, and start resizing (and
  confirming) a cirros-based instance between them back and forth.

  Some times the resize confirm fails with

  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "a3b3ecbe-2039-42fb-8365-da12e3c93bae" acquired by "nova.compute.manager.ComputeManager.confirm_resize.<lo
  cals>.do_confirm_resize" :: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}                                                                                                                     
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Going to confirm migration 33 {{(pid=136855) do_confirm_resize /opt/
  stack/nova/nova/compute/manager.py:4287}}                                                                                                                                                                                                                     
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Acquired lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.8
  /dist-packages/oslo_concurrency/lockutils.py:294}}                                                                                                                                                                                                            
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Building network info cache for instance {{(pid=136855) _get_instanc
  e_nw_info /opt/stack/nova/nova/network/neutron.py:1997}}                                                                                                                                                                                                      
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'info_cache' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr /opt/st
  ack/nova/nova/objects/instance.py:1099}}                                                                                                                                                                                                                      
  Feb 16 13:41:59 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Instance cache missing network info. {{(pid=136855) _get_preexisting
  _port_ids /opt/stack/nova/nova/network/neutron.py:3300}}                                                                                                                                                                                                      
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.network.neutron [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Updating instance_info_cache with network_info: [] {{(pid=136855) up
  date_instance_cache_with_nw_info /opt/stack/nova/nova/network/neutron.py:117}}                                                                                                                                                                                
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Releasing lock "refresh_cache-a3b3ecbe-2039-42fb-8365-da12e3c93bae" {{(pid=136855) lock /usr/local/lib/python3.
  8/dist-packages/oslo_concurrency/lockutils.py:312}}                                                                                                                                                                                                           
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG nova.objects.instance [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lazy-loading 'migration_context' on Instance uuid a3b3ecbe-2039-42fb-8365-da12e3c93bae {{(pid=136855) obj_load_attr 
  /opt/stack/nova/nova/objects/instance.py:1099}}                                                                                                                                                                                                               
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" acquired by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source" 
  :: waited 0.000s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:386}}                                                                                                                                              
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: DEBUG oslo_concurrency.lockutils [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] Lock "compute_resources" "released" by "nova.compute.resource_tracker.ResourceTracker.drop_move_claim_at_source
  " :: held 0.037s {{(pid=136855) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}                                                                                                                                              
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [None req-aab2e13d-e12b-47b7-9fa9-5c06343efbe6 admin admin] [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Confirm resize failed on source host master-dsvm. Resource allocatio
  ns in the placement service will be removed regardless because the instance is now on the destination host master-dsvm. You can try hard rebooting the instance to correct its state.: nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subs
  et of pinned CPU set [2, 3]                                                                                                                                                                                                                                   
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] Traceback (most recent call last):                                                                                              
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/manager.py", line 4316, in do_confirm_resize                                               
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._confirm_resize(                                                                                                       
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/manager.py", line 4401, in _confirm_resize                                                 
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self.rt.drop_move_claim_at_source(context, instance, migration)                                                             
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py", line 391, in inner                               
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     return f(*args, **kwargs)                                                                                                   
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 563, in drop_move_claim_at_source                               
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._drop_move_claim(                                                                                                      
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 638, in _drop_move_claim                                        
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     self._update_usage(usage, nodename, sign=-1)                                                                                
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1321, in _update_usage                                          
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     cn.numa_topology = hardware.numa_usage_from_instance_numa(
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/virt/hardware.py", line 2476, in numa_usage_from_instance_numa
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     new_cell.unpin_cpus(pinned_cpus)
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]   File "/opt/stack/nova/nova/objects/numa.py", line 106, in unpin_cpus
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae]     raise exception.CPUUnpinningInvalid(requested=list(cpus),
  Feb 16 13:42:00 master-dsvm nova-compute[136855]: ERROR nova.compute.manager [instance: a3b3ecbe-2039-42fb-8365-da12e3c93bae] nova.exception.CPUUnpinningInvalid: CPU set to unpin [1] must be a subset of pinned CPU set [2, 3]

  full log snippet is at
  https://paste.opendev.org/show/biKlHnGI4PPt451riHXn/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1961188/+subscriptions