yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #70343
[Bug 1741307] Related fix merged to nova (master)
Reviewed: https://review.openstack.org/531211
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5dad1ecc32e23f4e2656c210528a67ac16c0f834
Submitter: Zuul
Branch: master
commit 5dad1ecc32e23f4e2656c210528a67ac16c0f834
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu Jan 4 13:25:10 2018 -0500
Add regression test for resizing failing when using CachingScheduler
The cold migrate task in conductor assumes that the scheduler (or compute)
created allocations against the source node for the instance when it
attempts to swap those allocations to the migration record. However, if
using the CachingScheduler, the scheduler doesn't create allocations in
Placement, and if the computes are >=Pike, they won't either, so conductor
fails to find the allocations for the instance on the source node and fails.
This adds a functional regression test to show the failure. A follow up
patch with the fix will modify the test to show it passing again.
Change-Id: I80a401f6adce1c4e77a595928d6b9f085ff769a8
Related-Bug: #1741307
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1741307
Title:
Resize always fails when using the CachingScheduler
Status in OpenStack Compute (nova):
Fix Released
Bug description:
This is split off from bug 1741125 which is more about reschedules
failing.
Resize / cold migrate with the CachingScheduler simply doesn't work
because conductor assumes the scheduler created allocations for the
instance and tries to swap them to the migration record, which fails
here:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53
The instance won't have an allocation on the source node created by
the scheduler if using the CachingScheduler because the
CachingScheduler doesn't use Placement, and once all computes are
upgraded to at least Pike, the computes no longer create allocations
in Placement either (because they assume the scheduler is going to do
that).
So in this case, we basically need to just log something and continue
without swapping allocations.
The compute manager code should be OK since it just no-ops if the
migration record doesn't have an allocation:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965
That will, unfortunately, eventually lead to the compute asking the
resource tracker to remove the allocation for the instance which won't
exist either and we'll get an ERROR in the logs:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1741307/+subscriptions
References