yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1713786] [NEW] Allocations are not managed properly in all evacuate scenarios

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Tue, 29 Aug 2017 17:32:02 -0000
Reply-to: Bug 1713786 <1713786@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

Evacuate has some gaps with respect to dealing with resource allocations
in Placement:

1. If the user specifies a host with evacuate, conductor bypasses the
scheduler and we don't create allocations on the destination host:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749

This could eventually lead to the claim failing on the destination
compute:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

This is similar to bug 1712008 where forcing a host during live
migration bypasses the scheduler so allocations are not created in
placement on the destination node. Before Pike this would be handled via
the update_available_resource periodic task in the compute service which
would 'heal' the allocations for instances tracked on a given node, but
that is no longer happening once all computes are running Pike code due
to this change: https://review.openstack.org/#/c/491012/

2. If the user does not specify a host with evacuate, conductor will ask
the scheduler to pick a host, which will also create allocations for
that host via the scheduler. If the claim (or rebuild) fails on the
destination node, we don't cleanup the allocation on the destination
node even if the instance isn't spawned on it:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

^ is pretty obvious that we should cleanup because the claim for
resources failed.

This generic exception handling is harder to know if we should cleanup
though since we'd need to know if the guest was spawned on it:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812

But since we don't set the instance.host/node to the current host/node
it won't be reported there anyway:

https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824

** Affects: nova
     Importance: High
         Status: Triaged


** Tags: compute evacuate placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713786

Title:
  Allocations are not managed properly in all evacuate scenarios

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Evacuate has some gaps with respect to dealing with resource
  allocations in Placement:

  1. If the user specifies a host with evacuate, conductor bypasses the
  scheduler and we don't create allocations on the destination host:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749

  This could eventually lead to the claim failing on the destination
  compute:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

  This is similar to bug 1712008 where forcing a host during live
  migration bypasses the scheduler so allocations are not created in
  placement on the destination node. Before Pike this would be handled
  via the update_available_resource periodic task in the compute service
  which would 'heal' the allocations for instances tracked on a given
  node, but that is no longer happening once all computes are running
  Pike code due to this change: https://review.openstack.org/#/c/491012/

  2. If the user does not specify a host with evacuate, conductor will
  ask the scheduler to pick a host, which will also create allocations
  for that host via the scheduler. If the claim (or rebuild) fails on
  the destination node, we don't cleanup the allocation on the
  destination node even if the instance isn't spawned on it:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795

  ^ is pretty obvious that we should cleanup because the claim for
  resources failed.

  This generic exception handling is harder to know if we should cleanup
  though since we'd need to know if the guest was spawned on it:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812

  But since we don't set the instance.host/node to the current host/node
  it won't be reported there anyway:

  https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713786/+subscriptions

Follow ups

[Bug 1713786] Related fix merged to nova (master)
From: OpenStack Infra, 2017-09-12
[Bug 1713786] Re: Allocations are not managed properly in all evacuate scenarios
From: Matt Riedemann, 2017-08-29