yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91483
[Bug 2011127] [NEW] Nova scheduler stacks allocations in heterogeneous environments
Public bug reported:
Our OpenStack clouds consist of different hypervisor hardware
configurations, all of which are members of the same cell.
What we have observed is that many of the weighers in Nova will
encourage "stacking" of allocations instead of "spreading". That is to
say, the weighers will preferentially keep assigning greater weights to
the hypervisors with more resources until said hypervisors are
objectively over-provisioned compared to the hypervisors with less
resources.
Suppose for example that some of these hypervisors have 1/4th the amount
of RAM and physical CPU cores compared to others. What we observe is
that, assuming all hypervisors start empty, the hypervisors with 1/4th
the amount of RAM will not have a *single* instance assigned to them
even when others can have 1/2 or more of their resources allocated.
We dug into why, and landed upon this commit from 2013 which normalized the weights:
https://github.com/openstack/nova/commit/e5ba8494374a1b049eae257fe05b10c5804049ae
The normalization on the surface seems correct:
"weight = w1_multiplier * norm(w1) + w2_multiplier * norm(w2) + ..."
However, the computed values for w1 by the CPUWeigher and RAMWeigher,
etc. are objectively *not* correct anymore. The commits mentions that
all weighers should fall under two cases:
Case 1: Use of a percentage instead of absolute values (for example, % of free RAM).
Case 2: Use of absolute values.
However, if we look at current implementation, it does neither of these
things. In the case of the RAMWeigher, it returns the free RAM as an
absolute value. The normalization occurs with respect to the hypervisor
which has the most free RAM at the point in time of scheduling -- this
is not % free RAM per hypervisor and it has some hidden implications:
Suppose we take a fictitious example of two hypervisors, one ("HypA")
with 2 units of RAM and one ("HypB") with 10 units of RAM. And we assume
VMs of 0.25 units of RAM are allocated:
Upon the first first allocation, we compute these weights:
HypA: 2 units of free RAM, normalized weight = 0.2 (2/10)
HypB: 10 units of free RAM, normalized weight = 1.0 (10/10)
And the second:
HypA: 2 units of free RAM, normalized weight = 0.20512820512820512 (2/9.75)
HypB: 9.75 units of free RAM, normalized weight = 1.0 (9.75/9.75)
And the third:
HypA: 2 units of free RAM, normalized weight = 0.21052631578947367 (2/9.5)
HypB: 9.5 units of free RAM, normalized weight = 1.0 (9.5/9.5)
etc...
Thus the RAMWeigher continues stacking instances on HypB until HypB has
2 units of free RAM remaining, at which point it has 32 instances of
0.25 units of RAM. After this point, it begins spreading across both
hypervisors in lockstep fashion. But up until this points, it stacks.
This same problem occurs with the CPUWeigher, but it's even more
pernicious in that case because the CPUWeigher is acting on vCPUs wrt
operator-supplied CPU allocation ratios.
For example: lets suppose an operator configures Nova with
cpu_allocation_ratio = 3.0. In this case, a hypervisor with 2x as many
cores as another will have its cores over-provisioned (that is, more
than 1 vCPU/1 pCPU core allocated) before the other hypervisor gets a
single instance!
This is because the value returned to the normalization function is free
vCPUs over total vCPUs (# physical CPU cores * cpu_allocation_ratio). In
this way, stacking occurs on the hypervisor with twice the CPU cores up
until its physical CPU cores are over-provisioned @ 1.5vCPUs per
physical CPU core.
The documentation does not "even spreading", as it is referred to... but
this certainly does not seem correct.
** Affects: nova
Importance: Undecided
Assignee: Tyler Stachecki (tstachecki)
Status: In Progress
** Changed in: nova
Assignee: (unassigned) => Tyler Stachecki (tstachecki)
** Changed in: nova
Status: New => In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2011127
Title:
Nova scheduler stacks allocations in heterogeneous environments
Status in OpenStack Compute (nova):
In Progress
Bug description:
Our OpenStack clouds consist of different hypervisor hardware
configurations, all of which are members of the same cell.
What we have observed is that many of the weighers in Nova will
encourage "stacking" of allocations instead of "spreading". That is to
say, the weighers will preferentially keep assigning greater weights
to the hypervisors with more resources until said hypervisors are
objectively over-provisioned compared to the hypervisors with less
resources.
Suppose for example that some of these hypervisors have 1/4th the
amount of RAM and physical CPU cores compared to others. What we
observe is that, assuming all hypervisors start empty, the hypervisors
with 1/4th the amount of RAM will not have a *single* instance
assigned to them even when others can have 1/2 or more of their
resources allocated.
We dug into why, and landed upon this commit from 2013 which normalized the weights:
https://github.com/openstack/nova/commit/e5ba8494374a1b049eae257fe05b10c5804049ae
The normalization on the surface seems correct:
"weight = w1_multiplier * norm(w1) + w2_multiplier * norm(w2) + ..."
However, the computed values for w1 by the CPUWeigher and RAMWeigher,
etc. are objectively *not* correct anymore. The commits mentions that
all weighers should fall under two cases:
Case 1: Use of a percentage instead of absolute values (for example, % of free RAM).
Case 2: Use of absolute values.
However, if we look at current implementation, it does neither of
these things. In the case of the RAMWeigher, it returns the free RAM
as an absolute value. The normalization occurs with respect to the
hypervisor which has the most free RAM at the point in time of
scheduling -- this is not % free RAM per hypervisor and it has some
hidden implications:
Suppose we take a fictitious example of two hypervisors, one ("HypA")
with 2 units of RAM and one ("HypB") with 10 units of RAM. And we
assume VMs of 0.25 units of RAM are allocated:
Upon the first first allocation, we compute these weights:
HypA: 2 units of free RAM, normalized weight = 0.2 (2/10)
HypB: 10 units of free RAM, normalized weight = 1.0 (10/10)
And the second:
HypA: 2 units of free RAM, normalized weight = 0.20512820512820512 (2/9.75)
HypB: 9.75 units of free RAM, normalized weight = 1.0 (9.75/9.75)
And the third:
HypA: 2 units of free RAM, normalized weight = 0.21052631578947367 (2/9.5)
HypB: 9.5 units of free RAM, normalized weight = 1.0 (9.5/9.5)
etc...
Thus the RAMWeigher continues stacking instances on HypB until HypB
has 2 units of free RAM remaining, at which point it has 32 instances
of 0.25 units of RAM. After this point, it begins spreading across
both hypervisors in lockstep fashion. But up until this points, it
stacks.
This same problem occurs with the CPUWeigher, but it's even more
pernicious in that case because the CPUWeigher is acting on vCPUs wrt
operator-supplied CPU allocation ratios.
For example: lets suppose an operator configures Nova with
cpu_allocation_ratio = 3.0. In this case, a hypervisor with 2x as many
cores as another will have its cores over-provisioned (that is, more
than 1 vCPU/1 pCPU core allocated) before the other hypervisor gets a
single instance!
This is because the value returned to the normalization function is
free vCPUs over total vCPUs (# physical CPU cores *
cpu_allocation_ratio). In this way, stacking occurs on the hypervisor
with twice the CPU cores up until its physical CPU cores are over-
provisioned @ 1.5vCPUs per physical CPU core.
The documentation does not "even spreading", as it is referred to...
but this certainly does not seem correct.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2011127/+subscriptions
Follow ups