yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #82047
[Bug 1869050] [NEW] migration of anti-affinity server fails due to stale scheduler instance info
Public bug reported:
Description
===========
Steps to reproduce
==================
Have a deployment with 3 compute nodes
* make sure that the deployment is configured with tracks_instance_changes=True (True is the default)
* create and server group with anti-affinity policy
* boot server1 into the group
* boot server2 into the group
* migrate server2
* confirm the migration
* boot server3
Make sure that between the last two steps there was no periodic
_sync_scheduler_instance_info running on the compute that was hosted
server2 before the migration. This could done by doing the last too
steps after each other without waiting too much as interval of that
periodic (scheduler_instance_sync_interval) is defaulted to 120 sec.
Expected result
===============
server3 is booted on the host where server2 is moved away
Actual result
=============
server3 cannot be booted (NoValidHost)
Triage
======
The confirm resize call on the source compute does not update the
scheduler that the instance is removed from this host. This makes the
scheduler instance info stale and causing the subsequent scheduling
error.
** Affects: nova
Importance: Low
Assignee: Balazs Gibizer (balazs-gibizer)
Status: Triaged
** Tags: compute scheduler
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => Low
** Changed in: nova
Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)
** Tags added: compute scheduler
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1869050
Title:
migration of anti-affinity server fails due to stale scheduler
instance info
Status in OpenStack Compute (nova):
Triaged
Bug description:
Description
===========
Steps to reproduce
==================
Have a deployment with 3 compute nodes
* make sure that the deployment is configured with tracks_instance_changes=True (True is the default)
* create and server group with anti-affinity policy
* boot server1 into the group
* boot server2 into the group
* migrate server2
* confirm the migration
* boot server3
Make sure that between the last two steps there was no periodic
_sync_scheduler_instance_info running on the compute that was hosted
server2 before the migration. This could done by doing the last too
steps after each other without waiting too much as interval of that
periodic (scheduler_instance_sync_interval) is defaulted to 120 sec.
Expected result
===============
server3 is booted on the host where server2 is moved away
Actual result
=============
server3 cannot be booted (NoValidHost)
Triage
======
The confirm resize call on the source compute does not update the
scheduler that the instance is removed from this host. This makes the
scheduler instance info stale and causing the subsequent scheduling
error.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1869050/+subscriptions
Follow ups