yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80697
[Bug 1784983] Re: we should not set instance to ERROR state when rebuild_claim faild
Reviewed: https://review.opendev.org/692185
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=26e1d9c7237f7bd97ec5f1fd3e572b3927eea725
Submitter: Zuul
Branch: master
commit 26e1d9c7237f7bd97ec5f1fd3e572b3927eea725
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Wed Oct 30 12:11:43 2019 -0400
Reset vm_state to original value if rebuild claim fails
If while evacuating an active or stopped server the rebuild
resource claim or group affinity policy check fails, the state
of the server has not actually changed but the vm_state is changed
to ERROR because of the _error_out_instance_on_exception context
manager.
This builds on Ie4f9177f4d54cbc7dbcf58bd107fd5f24c60d8bb by
wrapping the BuildAbortException in InstanceFaultRollback for the
claim/group policy failures so the vm_state remains unchanged.
Note that the overall instance action record will still be marked
as a failure since the BuildAbortException is re-raised and the
wrap_instance_event decorator will fail the action (this is how the
user can know the operation failed).
Change-Id: I07fa46690d8f7b846665bc59c5e361873154382b
Closes-Bug: #1784983
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784983
Title:
we should not set instance to ERROR state when rebuild_claim faild
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
When a compute node is down, we evacaute the instances which locate in this compute. In concurrent scenario, serveral instances selecte the same destination node. And unfortunately,the memory is not enough for some instance, then the destination node raise the ComputeResourcesUnavailable exception, and set the instance to error state finally. But I think in ComputeResourcesUnavailable excepton, we should not set the instance to error state. In fact the instance remains in the source node.
Steps to reproduce
==================
* Create many instances in on source node, and the destination have little resource such memory.
* Power off the compute or stop the compute service in this node.
* Concurrently evacuate all instances in source node with specifying the destination node.
* Fortunately, you will find one or more instance in error state.
Expected result
===============
I wonder no instance is in error state when no enough resources.
Actual result
=============
Some instance is in error state .
Environment
===========
P release,But I found the issue also exists in main branch.
Logs & Configs
==============
2018-08-01 16:21:45.739 41514 DEBUG nova.notifications.objects.base [req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - - -] Defaulting the value of the field 'projects' to None in FlavorPayload due to 'Cannot call _load_projects on orphaned Flavor object' populate_schema /usr/lib/python2.7/site-packages/nova/notifications/objects/base.py:125
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [req-1710e7e5-9073-47f1-8ae8-1e68c65272c9 855c20651d244348b10c91d907aa59ca - - - -] [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Setting instance vm_state to ERROR: BuildAbortException: Build of instance 5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: Free memory 1141.00 MB < requested 2048 MB.
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] Traceback (most recent call last):
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7142, in _error_out_instance_on_exception
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] yield
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] File "/usr/lib/python2.7/site-packages/nova/fh/compute/manager.py", line 700, in rebuild_instance
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] instance_uuid=instance.uuid, reason=e.format_message())
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045] BuildAbortException: Build of instance 5b8ae80d-7e33-4099-8732-905355cee045 aborted: Insufficient compute resources: Free memory 1141.00 MB < requested 2048 MB.
2018-08-01 16:21:45.747 41514 ERROR nova.compute.manager [instance: 5b8ae80d-7e33-4099-8732-905355cee045]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784983/+subscriptions
References