← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1825537] Re: finish_resize failures incorrectly revert allocations

 

Gerrit is down but I've written a functional regression test to recreate
the bug and attached it as a patch for now.

** Description changed:

- This is purely based on code inspection right now, I need to write a
- functional test to recreate the issue.
- 
  While triaging bug 1821594 it got me thinking about handling placement
  allocations during resize when something fails, which got me thinking
  about an older fix:
  
  https://review.openstack.org/#/c/543971/6/nova/compute/manager.py@4457
  
  Looking back on that now, I think the revert during resize_instance is
  OK as long as the instance host/node has not changed, but I think doing
  it when finish_resize fails was probably a mistake because the
  instance.host in the nova db won't match where the allocations exist in
  placement. Before Pike this was fine since the ResourceTracker would
  heal the allocations in the update_available_resource periodic task, but
  we don't have that anymore.
  
  So this could result in an instance reported as being on the dest host
  in the nova database with the new flavor, which is where it will get
  rebuilt/rebooted/etc, but placement will be tracking the instance
  resource allocations using the old flavor against the source host, which
  is not where the instance is.
  
  Furthermore, if finish_resize fails, the instance should be in ERROR
  status and the user would likely try to hard reboot the instance to
  correct that status, which would happen on the dest host.

** Patch added: "0001-Add-functional-recreate-test-for-regression-bug-1825.patch"
   https://bugs.launchpad.net/nova/+bug/1825537/+attachment/5257070/+files/0001-Add-functional-recreate-test-for-regression-bug-1825.patch

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Changed in: nova/pike
       Status: New => Confirmed

** Changed in: nova/queens
       Status: New => Confirmed

** Changed in: nova/stein
       Status: New => Confirmed

** Changed in: nova/rocky
       Status: New => Confirmed

** Changed in: nova/queens
   Importance: Undecided => Medium

** Changed in: nova/stein
   Importance: Undecided => Medium

** Changed in: nova/rocky
   Importance: Undecided => Medium

** Changed in: nova/pike
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1825537

Title:
  finish_resize failures incorrectly revert allocations

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed

Bug description:
  While triaging bug 1821594 it got me thinking about handling placement
  allocations during resize when something fails, which got me thinking
  about an older fix:

  https://review.openstack.org/#/c/543971/6/nova/compute/manager.py@4457

  Looking back on that now, I think the revert during resize_instance is
  OK as long as the instance host/node has not changed, but I think
  doing it when finish_resize fails was probably a mistake because the
  instance.host in the nova db won't match where the allocations exist
  in placement. Before Pike this was fine since the ResourceTracker
  would heal the allocations in the update_available_resource periodic
  task, but we don't have that anymore.

  So this could result in an instance reported as being on the dest host
  in the nova database with the new flavor, which is where it will get
  rebuilt/rebooted/etc, but placement will be tracking the instance
  resource allocations using the old flavor against the source host,
  which is not where the instance is.

  Furthermore, if finish_resize fails, the instance should be in ERROR
  status and the user would likely try to hard reboot the instance to
  correct that status, which would happen on the dest host.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1825537/+subscriptions


References