yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1749215] [NEW] Allocations not deleted on failed resize

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Claudiu Belu <cbelu@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 13 Feb 2018 15:20:41 -0000
Reply-to: Bug 1749215 <1749215@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

Description
===========

During a resize, an instance's allocations are removed and replaced by 2
sets of allocations instead. If a resize is completed sucessfully, one
set of allocations is correctly removed, but in case of a failure,
neither set of allocations is removed. Only one set of allocations are
removed if the instance is deleted.

This happens because the call self.compute_rpcapi.resize_instance [1] is
an RPC cast (async), instead of a call (sync). Because of this, the
Except branch [2] in which the allocation is cleared and the instance is
rescheduled, is never called.

Additionally, because not all of the allocations are cleared, the
resources on the compute nodes will become "locked" and unusable. At
some point, instances will no longer be scheduled to those compute
nodes, due to all the resources being "allocated".

[1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4085
[2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4123


Steps to reproduce
==================

* Spawn an instance.
* Observe that the table nova_api.allocations only has 1 set of allocations for the instance (VCPU, MEMORY, DISK).
* Cold resize to an invalid flavor (e.g.: smaller disk).
* Observe that the table nova_api.allocations has 2 sets of allocations for the instance.
* Observe that the cold resize failed, and that the instance's task state has been reverted to its original state.
* Observe that the table nova_api.allocations continues to have 2 sets of allocations.
* Delete the instance.
* Observe even after the instance has been destroyed, there is still 1 set of allocations for the instance.


Expected result
===============

After the cold resize failed, there should be only 1 set of allocations
in the nova_api.allocations table, and after deleting the instance,
there shouldn't be any.


Actual result
=============

After the cold resize failed, there are 2 sets of allocations in the
nova_api.allocations table, after deleting the instance, there is 1 set
of allocations.


Environment
===========

Branch: Queens
Hypervisor: Hyper-V Server 2012 R2 (unrelated)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749215

Title:
  Allocations not deleted on failed resize

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  During a resize, an instance's allocations are removed and replaced by
  2 sets of allocations instead. If a resize is completed sucessfully,
  one set of allocations is correctly removed, but in case of a failure,
  neither set of allocations is removed. Only one set of allocations are
  removed if the instance is deleted.

  This happens because the call self.compute_rpcapi.resize_instance [1]
  is an RPC cast (async), instead of a call (sync). Because of this, the
  Except branch [2] in which the allocation is cleared and the instance
  is rescheduled, is never called.

  Additionally, because not all of the allocations are cleared, the
  resources on the compute nodes will become "locked" and unusable. At
  some point, instances will no longer be scheduled to those compute
  nodes, due to all the resources being "allocated".

  [1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4085
  [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4123

  
  Steps to reproduce
  ==================

  * Spawn an instance.
  * Observe that the table nova_api.allocations only has 1 set of allocations for the instance (VCPU, MEMORY, DISK).
  * Cold resize to an invalid flavor (e.g.: smaller disk).
  * Observe that the table nova_api.allocations has 2 sets of allocations for the instance.
  * Observe that the cold resize failed, and that the instance's task state has been reverted to its original state.
  * Observe that the table nova_api.allocations continues to have 2 sets of allocations.
  * Delete the instance.
  * Observe even after the instance has been destroyed, there is still 1 set of allocations for the instance.

  
  Expected result
  ===============

  After the cold resize failed, there should be only 1 set of
  allocations in the nova_api.allocations table, and after deleting the
  instance, there shouldn't be any.

  
  Actual result
  =============

  After the cold resize failed, there are 2 sets of allocations in the
  nova_api.allocations table, after deleting the instance, there is 1
  set of allocations.

  
  Environment
  ===========

  Branch: Queens
  Hypervisor: Hyper-V Server 2012 R2 (unrelated)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1749215/+subscriptions

Follow ups

[Bug 1749215] Re: Allocations not deleted on failed resize_instance
From: Matt Riedemann, 2019-04-19
[Bug 1749215] Re: Allocations not deleted on failed resize_instance
From: OpenStack Infra, 2018-02-27
[Bug 1749215] Re: Allocations not deleted on failed resize
From: Matt Riedemann, 2018-02-13