← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1713786] Re: Allocations are not managed properly in all evacuate scenarios

 

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Changed in: nova/pike
       Status: New => Confirmed

** Changed in: nova/pike
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713786

Title:
  Allocations are not managed properly in all evacuate scenarios

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Confirmed

Bug description:
  Evacuate has some gaps with respect to dealing with resource
  allocations in Placement:

  1. If the user specifies a host with evacuate, conductor bypasses the
  scheduler and we don't create allocations on the destination host:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749

  This could eventually lead to the claim failing on the destination
  compute:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

  This is similar to bug 1712008 where forcing a host during live
  migration bypasses the scheduler so allocations are not created in
  placement on the destination node. Before Pike this would be handled
  via the update_available_resource periodic task in the compute service
  which would 'heal' the allocations for instances tracked on a given
  node, but that is no longer happening once all computes are running
  Pike code due to this change: https://review.openstack.org/#/c/491012/

  2. If the user does not specify a host with evacuate, conductor will
  ask the scheduler to pick a host, which will also create allocations
  for that host via the scheduler. If the claim (or rebuild) fails on
  the destination node, we don't cleanup the allocation on the
  destination node even if the instance isn't spawned on it:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

  ^ is pretty obvious that we should cleanup because the claim for
  resources failed.

  This generic exception handling is harder to know if we should cleanup
  though since we'd need to know if the guest was spawned on it:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812

  But since we don't set the instance.host/node to the current host/node
  it won't be reported there anyway:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713786/+subscriptions


References