yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67455
[Bug 1713786] Related fix merged to nova (master)
Reviewed: https://review.openstack.org/499877
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=30946f9a5eeea839631cdb1dba9c26d45f7a8d00
Submitter: Jenkins
Branch: master
commit 30946f9a5eeea839631cdb1dba9c26d45f7a8d00
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu Aug 31 22:56:59 2017 -0400
Add a test to make sure failed evacuate cleans up dest allocation
If we actually make the MoveClaim but the evacuation fails
in the virt driver, the drop_move_claim via the MoveClaim.abort
will remove the destination node allocation. This adds a functional
test to show that works.
Change-Id: Ib58c487e97a041b8498746e8a276efffee239c56
Related-Bug: #1713786
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713786
Title:
Allocations are not managed properly in all evacuate scenarios
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
In Progress
Bug description:
Evacuate has some gaps with respect to dealing with resource
allocations in Placement:
1. If the user specifies a host with evacuate, conductor bypasses the
scheduler and we don't create allocations on the destination host:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749
This could eventually lead to the claim failing on the destination
compute:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795
This is similar to bug 1712008 where forcing a host during live
migration bypasses the scheduler so allocations are not created in
placement on the destination node. Before Pike this would be handled
via the update_available_resource periodic task in the compute service
which would 'heal' the allocations for instances tracked on a given
node, but that is no longer happening once all computes are running
Pike code due to this change: https://review.openstack.org/#/c/491012/
2. If the user does not specify a host with evacuate, conductor will
ask the scheduler to pick a host, which will also create allocations
for that host via the scheduler. If the claim (or rebuild) fails on
the destination node, we don't cleanup the allocation on the
destination node even if the instance isn't spawned on it:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795
^ is pretty obvious that we should cleanup because the claim for
resources failed.
This generic exception handling is harder to know if we should cleanup
though since we'd need to know if the guest was spawned on it:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812
But since we don't set the instance.host/node to the current host/node
it won't be reported there anyway:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713786/+subscriptions
References