yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83447
[Bug 1889633] [NEW] Pinned instance with thread policy can consume VCPU
Public bug reported:
In Train, we introduced the concept of the 'PCPU' resource type to track
pinned instance CPU usage. The '[compute] cpu_dedicated_set' is used to
indicate which host cores should be used by pinned instances and, once
this config option was set, nova would start reporting 'PCPU' resource
types in addition to (or entirely instead of, if 'cpu_shared_set' was
unset) 'VCPU'. Requests for pinned instances (via the
'hw:cpu_policy=dedicated' flavor extra spec or equivalent image metadata
property) would result in a query for 'PCPU' inventory rather than
'VCPU', as previously done.
We anticipated some upgrade issues with this change, whereby there could
be a period during an upgrade in which some hosts would have the new
configuration, meaning they'd be reporting PCPU, but the remainder would
still be on legacy config and therefore would continue reporting just
VCPU. An instance could be reasonably expected to land on any host, but
since only the hosts with the new configuration were reporting 'PCPU'
inventory and the 'hw:cpu_policy=dedicated' extra spec was resulting in
a request for 'PCPU', the hosts with legacy configuration would never be
consumed.
We worked around this issue by adding support for a fallback placement
query, enabled by default, which would make a second request using
'VCPU' inventory instead of 'PCPU'. The idea behind this was that the
hosts with 'PCPU' inventory would be preferred, meaning we'd only try
the 'VCPU' allocation if the preferred path failed. Crucially, we
anticipated that if a host with new style configuration was picked up by
this second 'VCPU' query, an instance would never actually be able to
build there. This is because the new-style configuration would be
reflected in the 'numa_topology' blob of the 'ComputeNode' object,
specifically via the 'cpuset' (for cores allocated to 'VCPU') and
'pcpuset' (for cores allocated to 'PCPU') fields. With new-style
configuration, both of these are set to unique values. If the scheduler
had determined that there wasn't enough 'PCPU' inventory available for
the instance, that would implicitly mean there weren't enough of the
cores listed in the 'pcpuset' field still available.
Turns out there's a gap in this thinking: thread policies. The 'isolate'
CPU thread policy previously meant "give me a host with no hyperthreads,
else a host with hyperthreads but mark the thread siblings of the cores
used by the instance as reserved". This didn't translate to a new 'PCPU'
world where we needed to know how many cores we were consuming up front
before landing on the host. To work around this, we removed support for
the latter case and instead relied on a trait, 'HW_CPU_HYPERTHEADING',
to indicate whether a host had hyperthread support or not. Using the
'isolate' policy meant that trait could not be defined on the host, or
the trait was "forbidden". The gap comes via a combination of this trait
request and the fallback query. If we request the isolate thread policy,
hosts with new-style configuration and sufficient PCPU inventory would
nonetheless be rejected if they reported the 'HW_CPU_HYPERTHEADING'
trait. However, these could get picked up in the fallback query and the
instance would not fail to build on the host because of lack of 'PCPU'
inventory. This means we end up with a pinned instance on a host using
new-style configuration that is consuming 'VCPU' inventory. Boo.
# Steps to reproduce
1. Using a host with hyperthreading support enabled, configure both
'[compute] cpu_dedicated_set' and '[compute] cpu_shared_set'
2. Boot an instance with the 'hw:cpu_thread_policy=isolate' extra spec.
# Expected result
Instance should not boot since the host has hyperthreads.
# Actual result
Instance boots.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1889633
Title:
Pinned instance with thread policy can consume VCPU
Status in OpenStack Compute (nova):
New
Bug description:
In Train, we introduced the concept of the 'PCPU' resource type to
track pinned instance CPU usage. The '[compute] cpu_dedicated_set' is
used to indicate which host cores should be used by pinned instances
and, once this config option was set, nova would start reporting
'PCPU' resource types in addition to (or entirely instead of, if
'cpu_shared_set' was unset) 'VCPU'. Requests for pinned instances (via
the 'hw:cpu_policy=dedicated' flavor extra spec or equivalent image
metadata property) would result in a query for 'PCPU' inventory rather
than 'VCPU', as previously done.
We anticipated some upgrade issues with this change, whereby there
could be a period during an upgrade in which some hosts would have the
new configuration, meaning they'd be reporting PCPU, but the remainder
would still be on legacy config and therefore would continue reporting
just VCPU. An instance could be reasonably expected to land on any
host, but since only the hosts with the new configuration were
reporting 'PCPU' inventory and the 'hw:cpu_policy=dedicated' extra
spec was resulting in a request for 'PCPU', the hosts with legacy
configuration would never be consumed.
We worked around this issue by adding support for a fallback placement
query, enabled by default, which would make a second request using
'VCPU' inventory instead of 'PCPU'. The idea behind this was that the
hosts with 'PCPU' inventory would be preferred, meaning we'd only try
the 'VCPU' allocation if the preferred path failed. Crucially, we
anticipated that if a host with new style configuration was picked up
by this second 'VCPU' query, an instance would never actually be able
to build there. This is because the new-style configuration would be
reflected in the 'numa_topology' blob of the 'ComputeNode' object,
specifically via the 'cpuset' (for cores allocated to 'VCPU') and
'pcpuset' (for cores allocated to 'PCPU') fields. With new-style
configuration, both of these are set to unique values. If the
scheduler had determined that there wasn't enough 'PCPU' inventory
available for the instance, that would implicitly mean there weren't
enough of the cores listed in the 'pcpuset' field still available.
Turns out there's a gap in this thinking: thread policies. The
'isolate' CPU thread policy previously meant "give me a host with no
hyperthreads, else a host with hyperthreads but mark the thread
siblings of the cores used by the instance as reserved". This didn't
translate to a new 'PCPU' world where we needed to know how many cores
we were consuming up front before landing on the host. To work around
this, we removed support for the latter case and instead relied on a
trait, 'HW_CPU_HYPERTHEADING', to indicate whether a host had
hyperthread support or not. Using the 'isolate' policy meant that
trait could not be defined on the host, or the trait was "forbidden".
The gap comes via a combination of this trait request and the fallback
query. If we request the isolate thread policy, hosts with new-style
configuration and sufficient PCPU inventory would nonetheless be
rejected if they reported the 'HW_CPU_HYPERTHEADING' trait. However,
these could get picked up in the fallback query and the instance would
not fail to build on the host because of lack of 'PCPU' inventory.
This means we end up with a pinned instance on a host using new-style
configuration that is consuming 'VCPU' inventory. Boo.
# Steps to reproduce
1. Using a host with hyperthreading support enabled, configure both
'[compute] cpu_dedicated_set' and '[compute] cpu_shared_set'
2. Boot an instance with the 'hw:cpu_thread_policy=isolate' extra
spec.
# Expected result
Instance should not boot since the host has hyperthreads.
# Actual result
Instance boots.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1889633/+subscriptions
Follow ups