yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #70198
[Bug 1741307] [NEW] Resize always fails when using the CachingScheduler
Public bug reported:
This is split off from bug 1741125 which is more about reschedules
failing.
Resize / cold migrate with the CachingScheduler simply doesn't work
because conductor assumes the scheduler created allocations for the
instance and tries to swap them to the migration record, which fails
here:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53
The instance won't have an allocation on the source node created by the
scheduler if using the CachingScheduler because the CachingScheduler
doesn't use Placement, and once all computes are upgraded to at least
Pike, the computes no longer create allocations in Placement either
(because they assume the scheduler is going to do that).
So in this case, we basically need to just log something and continue
without swapping allocations.
The compute manager code should be OK since it just no-ops if the
migration record doesn't have an allocation:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965
That will, unfortunately, eventually lead to the compute asking the
resource tracker to remove the allocation for the instance which won't
exist either and we'll get an ERROR in the logs:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339
** Affects: nova
Importance: High
Assignee: Matt Riedemann (mriedem)
Status: Triaged
** Tags: cachingscheduler queens-rc-potential resize
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1741307
Title:
Resize always fails when using the CachingScheduler
Status in OpenStack Compute (nova):
Triaged
Bug description:
This is split off from bug 1741125 which is more about reschedules
failing.
Resize / cold migrate with the CachingScheduler simply doesn't work
because conductor assumes the scheduler created allocations for the
instance and tries to swap them to the migration record, which fails
here:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53
The instance won't have an allocation on the source node created by
the scheduler if using the CachingScheduler because the
CachingScheduler doesn't use Placement, and once all computes are
upgraded to at least Pike, the computes no longer create allocations
in Placement either (because they assume the scheduler is going to do
that).
So in this case, we basically need to just log something and continue
without swapping allocations.
The compute manager code should be OK since it just no-ops if the
migration record doesn't have an allocation:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965
That will, unfortunately, eventually lead to the compute asking the
resource tracker to remove the allocation for the instance which won't
exist either and we'll get an ERROR in the logs:
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103
https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1741307/+subscriptions
Follow ups