← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1869050] Re: migration of anti-affinity server fails due to stale scheduler instance info

 

** Also affects: nova/train
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Changed in: nova/pike
       Status: New => Triaged

** Changed in: nova/queens
       Status: New => Triaged

** Changed in: nova/rocky
       Status: New => Triaged

** Changed in: nova/stein
       Status: New => Triaged

** Changed in: nova/pike
   Importance: Undecided => Low

** Changed in: nova/train
       Status: New => Triaged

** Changed in: nova/rocky
   Importance: Undecided => Low

** Changed in: nova/stein
   Importance: Undecided => Low

** Changed in: nova/train
   Importance: Undecided => Low

** Changed in: nova/queens
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1869050

Title:
  migration of anti-affinity server fails due to stale scheduler
  instance info

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged
Status in OpenStack Compute (nova) stein series:
  Triaged
Status in OpenStack Compute (nova) train series:
  Triaged

Bug description:
  Description
  ===========

  
  Steps to reproduce
  ==================
  Have a deployment with 3 compute nodes

  * make sure that the deployment is configured with tracks_instance_changes=True (True is the default)
  * create and server group with anti-affinity policy
  * boot server1 into the group
  * boot server2 into the group
  * migrate server2
  * confirm the migration
  * boot server3

  Make sure that between the last two steps there was no periodic
  _sync_scheduler_instance_info running on the compute that was hosted
  server2 before the migration. This could done by doing the last too
  steps after each other without waiting too much as interval of that
  periodic (scheduler_instance_sync_interval) is defaulted to 120 sec.

  Expected result
  ===============
  server3 is booted on the host where server2 is moved away

  Actual result
  =============
  server3 cannot be booted (NoValidHost)

  Triage
  ======

  The confirm resize call on the source compute does not update the
  scheduler that the instance is removed from this host. This makes the
  scheduler instance info stale and causing the subsequent scheduling
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions


References