yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #28618
[Bug 1423648] [NEW] race conditions with server group scheduler policies
Public bug reported:
In git commit a79ecbe Russel Bryant submitted a partial fix for a race
condition when booting an instance as part of a server group with an
"anti-affinity" scheduler policy.
That fix only solves part of the problem, however. There are a number
of issues remaining:
1) It's possible to hit a similar race condition for server groups with
the "affinity" policy. Suppose we create a new group and then create
two instances simultaneously. The scheduler sees an empty group for
each, assigns them to different compute nodes, and the policy is
violated. We should add a check in _validate_instance_group_policy() to
cover the "affinity" case.
2) It's possible to create two instances simultaneously, have them be
scheduled to conflicting hosts, both of them detect the problem in
_validate_instance_group_policy(), both of them get sent back for
rescheduling, and both of them get assigned to conflicting hosts
*again*, resulting in an error. In order to fix this I propose that
instead of checking against all other instances in the group, we only
check against instances that were created before the current instance.
** Affects: nova
Importance: Undecided
Status: New
** Tags: compute
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1423648
Title:
race conditions with server group scheduler policies
Status in OpenStack Compute (Nova):
New
Bug description:
In git commit a79ecbe Russel Bryant submitted a partial fix for a race
condition when booting an instance as part of a server group with an
"anti-affinity" scheduler policy.
That fix only solves part of the problem, however. There are a number
of issues remaining:
1) It's possible to hit a similar race condition for server groups
with the "affinity" policy. Suppose we create a new group and then
create two instances simultaneously. The scheduler sees an empty
group for each, assigns them to different compute nodes, and the
policy is violated. We should add a check in
_validate_instance_group_policy() to cover the "affinity" case.
2) It's possible to create two instances simultaneously, have them be
scheduled to conflicting hosts, both of them detect the problem in
_validate_instance_group_policy(), both of them get sent back for
rescheduling, and both of them get assigned to conflicting hosts
*again*, resulting in an error. In order to fix this I propose that
instead of checking against all other instances in the group, we only
check against instances that were created before the current instance.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1423648/+subscriptions
Follow ups
References