← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1723005] Re: CPUPinningInvalid exception occurred when evacuate one instance repeatedly.

 

[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1723005

Title:
  CPUPinningInvalid exception occurred when evacuate  one instance
  repeatedly.

Status in OpenStack Compute (nova):
  Expired

Bug description:
  Description
  ===========
  Evacuate instance which has NUMA topology failed, instance's vm_state become error when instance do pin_cpus operation. the failed exception is CPUPinningInvalid.

  Steps to reproduce
  ==================
  I wrote a monitor process to evacuate instances automatically. This process is to detect compute nodes whose service is down and evacuate instances running on these compute nodes. When running this process to auto test, some instances become error after evacuate.

  $ nova list --all-tenants | grep wdl_chongsheng_vm-2
  | c90a1a71-4c5b-418a-b513-907ee1c956a0 | wdl_chongsheng_vm-2 | e3ddf976a1654dd89cf03820cb55b946 | ERROR  | -          | Running     | robot_test_network=192.168.1.147 |

  Error logs:
  2017-10-10 17:10:31.294 20488 INFO nova.compute.manager [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Rebuilding instance
  2017-10-10 17:10:31.360 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Attempting claim: memory 2048 MB, disk 0 GB, vcpus 2 CPU
  2017-10-10 17:10:31.361 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Total memory: 63599 MB, used: 25600.00 MB
  2017-10-10 17:10:31.361 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] memory limit not specified, defaulting to unlimited
  2017-10-10 17:10:31.361 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Total disk: 170 GB, used: 20.00 GB
  2017-10-10 17:10:31.362 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] disk limit not specified, defaulting to unlimited
  2017-10-10 17:10:31.362 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Total vcpu: 32 VCPU, used: 8.00 VCPU
  2017-10-10 17:10:31.362 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] vcpu limit not specified, defaulting to unlimited
  2017-10-10 17:10:31.399 20488 INFO nova.compute.claims [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Claim successful
  2017-10-10 17:10:31.461 20488 INFO nova.compute.resource_tracker [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] Updating from migration c90a1a71-4c5b-418a-b513-907ee1c956a0
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Setting instance vm_state to ERROR
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Traceback (most recent call last):
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7419, in _error_out_instance_on_exception
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     yield
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2990, in rebuild_instance
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     migration=migration)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     return f(*args, **kwargs)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 235, in rebuild_claim
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     image_meta=image_meta, migration=migration)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 299, in _move_claim
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     migration)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 913, in _update_usage_from_migration
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     self._update_usage(usage)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 821, in _update_usage
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     self.compute_node, usage, free)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1784, in get_host_numa_usage_from_instance
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     host_numa_topology, instance_numa_topology, free=free))
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/virt/hardware.py", line 1651, in numa_usage_from_instances
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     newcell.pin_cpus(pinned_cpus)
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]   File "/usr/lib/python2.7/site-packages/nova/objects/numa.py", line 90, in pin_cpus
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]     pinned=list(self.pinned_cpus))
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] CPUPinningInvalid: Cannot pin/unpin cpus [8, 28] from the following pinned set [1, 5, 8, 9, 21, 25, 28, 29]
  2017-10-10 17:10:31.492 20488 ERROR nova.compute.manager [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0]
  2017-10-10 17:10:31.617 20488 INFO nova_patch.compute.utils [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] Report alarm instance recover error. Details: alarm_instance_recover_error, instance_name: wdl_chongsheng_vm-2, instance_id: c90a1a71-4c5b-418a-b513-907ee1c956a0, action: rebuild_instance, result: True
  2017-10-10 17:10:31.778 20488 INFO nova.compute.manager [req-36368859-16ae-4e4f-a1f4-03a559942ec5 - - - - -] [instance: c90a1a71-4c5b-418a-b513-907ee1c956a0] Successfully reverted task state from rebuilding on failure for instance.

  Expected result
  ===============
  Evacuate successfully when evacuate one instance whose task_state is None and host is down repeatedly.

  Actual result
  =============
  Evacuate failed.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

    git log -1
    commit 219c2660cdc936c9d1469d7629645e05a511fbf0
    Merge: 6b8bbb1 3a19f89
    Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
    Date:   Wed Oct 11 02:32:39 2017 +0000

      Merge "Fix minor input items from previous patches"

  2. Which hypervisor did you use?
      Libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1723005/+subscriptions


References