yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1423648] [NEW] race conditions with server group scheduler policies

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Chris Friesen <chris.friesen@xxxxxxxxxxxxx>
Date: Thu, 19 Feb 2015 18:21:55 -0000
Reply-to: Bug 1423648 <1423648@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

In git commit a79ecbe Russel Bryant submitted a partial fix for a race
condition when booting an instance as part of a server group with an
"anti-affinity" scheduler policy.

That fix only solves part of the problem, however.  There are a number
of issues remaining:

1) It's possible to hit a similar race condition for server groups with
the "affinity" policy.  Suppose we create a new group and then create
two instances simultaneously.  The scheduler sees an empty group for
each, assigns them to different compute nodes, and the policy is
violated.  We should add a check in _validate_instance_group_policy() to
cover the "affinity" case.

2) It's possible to create two instances simultaneously, have them be
scheduled to conflicting hosts, both of them detect the problem in
_validate_instance_group_policy(), both of them get sent back for
rescheduling, and both of them get assigned to conflicting hosts
*again*, resulting in an error.  In order to fix this I propose that
instead of checking against all other instances in the group, we only
check against instances that were created before the current instance.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1423648

Title:
  race conditions with server group scheduler policies

Status in OpenStack Compute (Nova):
  New

Bug description:
  In git commit a79ecbe Russel Bryant submitted a partial fix for a race
  condition when booting an instance as part of a server group with an
  "anti-affinity" scheduler policy.

  That fix only solves part of the problem, however.  There are a number
  of issues remaining:

  1) It's possible to hit a similar race condition for server groups
  with the "affinity" policy.  Suppose we create a new group and then
  create two instances simultaneously.  The scheduler sees an empty
  group for each, assigns them to different compute nodes, and the
  policy is violated.  We should add a check in
  _validate_instance_group_policy() to cover the "affinity" case.

  2) It's possible to create two instances simultaneously, have them be
  scheduled to conflicting hosts, both of them detect the problem in
  _validate_instance_group_policy(), both of them get sent back for
  rescheduling, and both of them get assigned to conflicting hosts
  *again*, resulting in an error.  In order to fix this I propose that
  instead of checking against all other instances in the group, we only
  check against instances that were created before the current instance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1423648/+subscriptions

Follow ups

[Bug 1423648] [NEW] race conditions with server group scheduler policies
From: Chris Friesen, 2015-02-19

References

[Bug 1423648] [NEW] race conditions with server group scheduler policies
From: Chris Friesen, 2015-02-19