yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1741307] Related fix merged to nova (master)

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1741307@xxxxxxxxxxxxxxxxxx>
Date: Fri, 12 Jan 2018 09:53:10 -0000
Reply-to: Bug 1741307 <1741307@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/531211
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5dad1ecc32e23f4e2656c210528a67ac16c0f834
Submitter: Zuul
Branch:    master

commit 5dad1ecc32e23f4e2656c210528a67ac16c0f834
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Thu Jan 4 13:25:10 2018 -0500

    Add regression test for resizing failing when using CachingScheduler
    
    The cold migrate task in conductor assumes that the scheduler (or compute)
    created allocations against the source node for the instance when it
    attempts to swap those allocations to the migration record. However, if
    using the CachingScheduler, the scheduler doesn't create allocations in
    Placement, and if the computes are >=Pike, they won't either, so conductor
    fails to find the allocations for the instance on the source node and fails.
    
    This adds a functional regression test to show the failure. A follow up
    patch with the fix will modify the test to show it passing again.
    
    Change-Id: I80a401f6adce1c4e77a595928d6b9f085ff769a8
    Related-Bug: #1741307


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1741307

Title:
  Resize always fails when using the CachingScheduler

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  This is split off from bug 1741125 which is more about reschedules
  failing.

  Resize / cold migrate with the CachingScheduler simply doesn't work
  because conductor assumes the scheduler created allocations for the
  instance and tries to swap them to the migration record, which fails
  here:

  https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/conductor/tasks/migrate.py#L53

  The instance won't have an allocation on the source node created by
  the scheduler if using the CachingScheduler because the
  CachingScheduler doesn't use Placement, and once all computes are
  upgraded to at least Pike, the computes no longer create allocations
  in Placement either (because they assume the scheduler is going to do
  that).

  So in this case, we basically need to just log something and continue
  without swapping allocations.

  The compute manager code should be OK since it just no-ops if the
  migration record doesn't have an allocation:

  https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L3965

  That will, unfortunately, eventually lead to the compute asking the
  resource tracker to remove the allocation for the instance which won't
  exist either and we'll get an ERROR in the logs:

  https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/manager.py#L4103

  https://github.com/openstack/nova/blob/397dcab684b87eba257ccbe4b24a692deb72c13d/nova/compute/resource_tracker.py#L1339

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1741307/+subscriptions

References

[Bug 1741307] [NEW] Resize always fails when using the CachingScheduler
From: Matt Riedemann, 2018-01-04