yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1713786] Related fix merged to nova (master)

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1713786@xxxxxxxxxxxxxxxxxx>
Date: Tue, 12 Sep 2017 21:05:10 -0000
Reply-to: Bug 1713786 <1713786@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/499877
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=30946f9a5eeea839631cdb1dba9c26d45f7a8d00
Submitter: Jenkins
Branch:    master

commit 30946f9a5eeea839631cdb1dba9c26d45f7a8d00
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Thu Aug 31 22:56:59 2017 -0400

    Add a test to make sure failed evacuate cleans up dest allocation
    
    If we actually make the MoveClaim but the evacuation fails
    in the virt driver, the drop_move_claim via the MoveClaim.abort
    will remove the destination node allocation. This adds a functional
    test to show that works.
    
    Change-Id: Ib58c487e97a041b8498746e8a276efffee239c56
    Related-Bug: #1713786


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713786

Title:
  Allocations are not managed properly in all evacuate scenarios

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  In Progress

Bug description:
  Evacuate has some gaps with respect to dealing with resource
  allocations in Placement:

  1. If the user specifies a host with evacuate, conductor bypasses the
  scheduler and we don't create allocations on the destination host:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749

  This could eventually lead to the claim failing on the destination
  compute:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

  This is similar to bug 1712008 where forcing a host during live
  migration bypasses the scheduler so allocations are not created in
  placement on the destination node. Before Pike this would be handled
  via the update_available_resource periodic task in the compute service
  which would 'heal' the allocations for instances tracked on a given
  node, but that is no longer happening once all computes are running
  Pike code due to this change: https://review.openstack.org/#/c/491012/

  2. If the user does not specify a host with evacuate, conductor will
  ask the scheduler to pick a host, which will also create allocations
  for that host via the scheduler. If the claim (or rebuild) fails on
  the destination node, we don't cleanup the allocation on the
  destination node even if the instance isn't spawned on it:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

  ^ is pretty obvious that we should cleanup because the claim for
  resources failed.

  This generic exception handling is harder to know if we should cleanup
  though since we'd need to know if the guest was spawned on it:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812

  But since we don't set the instance.host/node to the current host/node
  it won't be reported there anyway:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713786/+subscriptions

References

[Bug 1713786] [NEW] Allocations are not managed properly in all evacuate scenarios
From: Matt Riedemann, 2017-08-29