← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1713796] Re: Failed unshelve does not remove allocations from destination node

 

Reviewed:  https://review.openstack.org/506458
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f18202185d05e3f7e89fca6bbc17daf3c5dc4b98
Submitter: Jenkins
Branch:    master

commit f18202185d05e3f7e89fca6bbc17daf3c5dc4b98
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Thu Sep 21 22:25:53 2017 -0400

    Remove allocations when unshelve fails on host
    
    When we unshelve an offloaded instance, the scheduler
    creates allocations in placement when picking a host.
    
    If the unshelve fails on the host, due to either the
    instance claim failing or the guest spawn failing, we
    need to remove the allocations since the instance isn't
    actually running on that host.
    
    Change-Id: Id2c7b7b3b4abda8a3b878fdee6806bcfe096e12e
    Closes-Bug: #1713796


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713796

Title:
  Failed unshelve does not remove allocations from destination node

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress

Bug description:
  During an unshelve from an offloaded instance, conductor will call the
  scheduler to pick a host. The scheduler will make allocations against
  the chosen node as part of that select_destinations() call. Then
  conductor casts to that compute host to unshelve the instance.

  If the spawn on the hypervisor fails while we've made the instance
  claim:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4485

  Or even if the claim test fails, the allocations on the destination
  node aren't removed in Placement.

  The RT aborts the claim here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L414

  That calls _update_usage_from_instance but doesn't change the
  has_ocata_computes kwarg so we get here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/resource_tracker.py#L1041

  And we don't cleanup the allocations for the instance.

  The other case is if the claim fails, the instance_claim method will
  raise ComputeResourcesUnavailable which would be handled here:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/claims.py#L161

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L4491

  But we don't remove allocations or do any other cleanup there.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713796/+subscriptions


References