← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1713796] [NEW] Failed unshelve does not remove allocations from destination node

 

Public bug reported:

During an unshelve from an offloaded instance, conductor will call the
scheduler to pick a host. The scheduler will make allocations against
the chosen node as part of that select_destinations() call. Then
conductor casts to that compute host to unshelve the instance.

If the spawn on the hypervisor fails while we've made the instance
claim:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485

Or even if the claim test fails, the allocations on the destination node
aren't removed in Placement.

The RT aborts the claim here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414

That calls _update_usage_from_instance but doesn't change the
has_ocata_computes kwarg so we get here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041

And we don't cleanup the allocations for the instance.

The other case is if the claim fails, the instance_claim method will
raise ComputeResourcesUnavailable which would be handled here:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491

But we don't remove allocations or do any other cleanup there.

** Affects: nova
     Importance: High
         Status: Triaged

** Affects: nova/pike
     Importance: High
         Status: Confirmed


** Tags: placement shelve unshelve

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Changed in: nova/pike
       Status: New => Confirmed

** Changed in: nova/pike
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713796

Title:
  Failed unshelve does not remove allocations from destination node

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Confirmed

Bug description:
  During an unshelve from an offloaded instance, conductor will call the
  scheduler to pick a host. The scheduler will make allocations against
  the chosen node as part of that select_destinations() call. Then
  conductor casts to that compute host to unshelve the instance.

  If the spawn on the hypervisor fails while we've made the instance
  claim:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485

  Or even if the claim test fails, the allocations on the destination
  node aren't removed in Placement.

  The RT aborts the claim here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414

  That calls _update_usage_from_instance but doesn't change the
  has_ocata_computes kwarg so we get here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041

  And we don't cleanup the allocations for the instance.

  The other case is if the claim fails, the instance_claim method will
  raise ComputeResourcesUnavailable which would be handled here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491

  But we don't remove allocations or do any other cleanup there.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713796/+subscriptions


Follow ups