yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1784983] [NEW] we should not set instance to ERROR state when rebuild_claim faild

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Tao Li <litao7050@xxxxxxxxxxxxx>
Date: Thu, 02 Aug 2018 03:32:52 -0000
Reply-to: Bug 1784983 <1784983@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

Description
===========
When a compute node is down, we evacaute the instances which locate in this compute. In concurrent scenario， serveral instances selecte the same destination node. And unfortunately，the memory is not enough for some instance, then the destination node raise the ComputeResourcesUnavailable exception, and set the instance to error state finally. But I think in ComputeResourcesUnavailable excepton, we should not set the instance to error state. In fact the instance remains in the source node.

Steps to reproduce
==================
* Create many instances in on source node, and the destination have little resource such memory.
* Power off the compute or stop the compute service in this node.
* Concurrently evacuate all instances in source node with specifying the destination node. 
* Fortunately， you will find one or more instance in error state.


Expected result
===============
I wonder no instance is in error state when no enough resources.

Actual result
=============
Some instance is in error state .

Environment
===========
P release，But I found the issue also exists in main branch.


Logs & Configs
==============
2018-08-01 16:21:45.739 41514 DEBUG nova.notifications.objects.base [req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - - -] Defaulting the value of the field 'projects' to None in FlavorPayload due to 'Cannot call _load_projects on orphaned Flavor object' populate_schema /usr/lib/python2.7/site-packages/nova/notifications/objects/base.py:125
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - - -] [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Setting instance vm_state to ERROR: BuildAbortException: Build of instance 5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: Free memory 1141.00 MB < requested 2048 MB.
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Traceback (most recent call last):
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7142, in _error_out_instance_on_exception
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]     yield
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]   File "/usr/lib/python2.7/site-packages/nova/fh/compute/manager.py", line 700, in rebuild_instance
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]     instance_uuid=instance.uuid, reason=e.format_message())
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] BuildAbortException: Build of instance 5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: Free memory 1141.00 MB < requested 2048 MB.
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784983

Title:
  we should not set instance to ERROR state when rebuild_claim faild

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  When a compute node is down, we evacaute the instances which locate in this compute. In concurrent scenario， serveral instances selecte the same destination node. And unfortunately，the memory is not enough for some instance, then the destination node raise the ComputeResourcesUnavailable exception, and set the instance to error state finally. But I think in ComputeResourcesUnavailable excepton, we should not set the instance to error state. In fact the instance remains in the source node.

  Steps to reproduce
  ==================
  * Create many instances in on source node, and the destination have little resource such memory.
  * Power off the compute or stop the compute service in this node.
  * Concurrently evacuate all instances in source node with specifying the destination node. 
  * Fortunately， you will find one or more instance in error state.

  
  Expected result
  ===============
  I wonder no instance is in error state when no enough resources.

  Actual result
  =============
  Some instance is in error state .

  Environment
  ===========
  P release，But I found the issue also exists in main branch.

  
  Logs & Configs
  ==============
  2018-08-01 16:21:45.739 41514 DEBUG nova.notifications.objects.base [req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - - -] Defaulting the value of the field 'projects' to None in FlavorPayload due to 'Cannot call _load_projects on orphaned Flavor object' populate_schema /usr/lib/python2.7/site-packages/nova/notifications/objects/base.py:125
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - - -] [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Setting instance vm_state to ERROR: BuildAbortException: Build of instance 5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: Free memory 1141.00 MB < requested 2048 MB.
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Traceback (most recent call last):
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7142, in _error_out_instance_on_exception
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]     yield
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]   File "/usr/lib/python2.7/site-packages/nova/fh/compute/manager.py", line 700, in rebuild_instance
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]     instance_uuid=instance.uuid, reason=e.format_message())
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] BuildAbortException: Build of instance 5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: Free memory 1141.00 MB < requested 2048 MB.
  2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784983/+subscriptions
Follow ups

[Bug 1784983] Re: we should not set instance to ERROR state when rebuild_claim faild
From: OpenStack Infra, 2019-11-15