yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79050
[Bug 1825537] Re: finish_resize failures incorrectly revert allocations
Reviewed: https://review.opendev.org/654067
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ea297d6ffba81c5dc982afe6519de09ff3744cad
Submitter: Zuul
Branch: master
commit ea297d6ffba81c5dc982afe6519de09ff3744cad
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Fri Apr 19 12:28:34 2019 -0400
Drop source node allocations if finish_resize fails
By the time finish_resize runs on the dest host, the instance
host/node values are already pointing at the dest (they are
set by resize_instance on the source compute before casting to
finish_resize on the dest). If finish_resize fails, the instance
is essentially stuck on the dest host so rather than revert the
allocations (which will drop the new flavor allocations against
the dest host where the instance now lives) we should just drop
the old flavor allocations on the source node resource provider,
which is what this change does.
The functional regression recreate test is updated to show this
working.
Change-Id: I52c8d038118c858004e17e71b2fba9e9e2714815
Closes-Bug: #1825537
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1825537
Title:
finish_resize failures incorrectly revert allocations
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) pike series:
Confirmed
Status in OpenStack Compute (nova) queens series:
Confirmed
Status in OpenStack Compute (nova) rocky series:
Confirmed
Status in OpenStack Compute (nova) stein series:
In Progress
Bug description:
While triaging bug 1821594 it got me thinking about handling placement
allocations during resize when something fails, which got me thinking
about an older fix:
https://review.openstack.org/#/c/543971/6/nova/compute/manager.py@4457
Looking back on that now, I think the revert during resize_instance is
OK as long as the instance host/node has not changed, but I think
doing it when finish_resize fails was probably a mistake because the
instance.host in the nova db won't match where the allocations exist
in placement. Before Pike this was fine since the ResourceTracker
would heal the allocations in the update_available_resource periodic
task, but we don't have that anymore.
So this could result in an instance reported as being on the dest host
in the nova database with the new flavor, which is where it will get
rebuilt/rebooted/etc, but placement will be tracking the instance
resource allocations using the old flavor against the source host,
which is not where the instance is.
Furthermore, if finish_resize fails, the instance should be in ERROR
status and the user would likely try to hard reboot the instance to
correct that status, which would happen on the dest host.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1825537/+subscriptions
References