yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #68088
[Bug 1713796] Re: Failed unshelve does not remove allocations from destination node
Reviewed: https://review.openstack.org/506458
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f18202185d05e3f7e89fca6bbc17daf3c5dc4b98
Submitter: Jenkins
Branch: master
commit f18202185d05e3f7e89fca6bbc17daf3c5dc4b98
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu Sep 21 22:25:53 2017 -0400
Remove allocations when unshelve fails on host
When we unshelve an offloaded instance, the scheduler
creates allocations in placement when picking a host.
If the unshelve fails on the host, due to either the
instance claim failing or the guest spawn failing, we
need to remove the allocations since the instance isn't
actually running on that host.
Change-Id: Id2c7b7b3b4abda8a3b878fdee6806bcfe096e12e
Closes-Bug: #1713796
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713796
Title:
Failed unshelve does not remove allocations from destination node
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
In Progress
Bug description:
During an unshelve from an offloaded instance, conductor will call the
scheduler to pick a host. The scheduler will make allocations against
the chosen node as part of that select_destinations() call. Then
conductor casts to that compute host to unshelve the instance.
If the spawn on the hypervisor fails while we've made the instance
claim:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485
Or even if the claim test fails, the allocations on the destination
node aren't removed in Placement.
The RT aborts the claim here:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414
That calls _update_usage_from_instance but doesn't change the
has_ocata_computes kwarg so we get here:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041
And we don't cleanup the allocations for the instance.
The other case is if the claim fails, the instance_claim method will
raise ComputeResourcesUnavailable which would be handled here:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491
But we don't remove allocations or do any other cleanup there.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713796/+subscriptions
References