yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95279
[Bug 2095591] Re: CPUs in exclusive cpusets are used for scheduling vCPUs
nova provides 2 ways to configure which CPU can be used by vms
you can use the legacy vcpu_pin_set
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.vcpu_pin_set
or the preferred cpu_share_set and cpu_dedicated_set to specify which
CPU nova will use
https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_shared_set
https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set
the installer of nova is required to configure those to align to the
resource that are available to nova if only a subset of host cpus are
valid.
nova should not be parsing the cgroup file system to try and determine what the allowable set.
there are many factor that could prevent that form working such as being deployed in a container beyond the fact that the cgoup API is not entirely stable.
we can assume that the machine.slice for one is the correct slise to
look at as you could deploy nova in its own slice in the group tree
if we were to do this it would be a new feature not a bug and would have
to be discussed including the upgrade impact it would have before being
implemented.
with that context I'm marking this as opinion.
if you want to discuss this more you could bring it up on the mailing
list. irc or in the ptg as a new feature for next cycle but i don't
think this is in line with the direction of nova.
** Changed in: nova
Importance: Undecided => Wishlist
** Changed in: nova
Status: In Progress => Opinion
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2095591
Title:
CPUs in exclusive cpusets are used for scheduling vCPUs
Status in OpenStack Compute (nova):
Opinion
Bug description:
Description
===========
Nova presumes that all the host's CPUs can schedule and execute vCPUs.
This assumption is wrong because the operator might want to allocate a
specific subset of the host's available CPUs to execute vCPU code.
Given that Nova uses libvirt when deployed on Linux hosts, all major Linux distributions run systemd and by default, libvirt uses the machine.slice cgroup to spawn virtual machines, an easy way for an operator to limit the scheduling of vCPUs to a subset of the host's CPUs by using the cpuset controller for the machine.slice cgroup.
Going a step further, the operator might need to allocate another subset of the host's CPUs to a latency-sensitive application like a software-defined storage solution or a database. The operator sets the cpu_exclusive bit on the custom cpuset to ensure the kernel won't schedule any other process on the CPU subset allocated for the latency-sensitive application.
The above scenario leads to an error when Nova attempts to spawn an
instance because the kernel throws a "Permission denied" error when
libvirt tries to create a child cgroup with a cpuset containing all
host CPUs. This violates the constraint imposed by the cpu_exclusive
bit in the custom cpuset and the kernel returns an error. This is
valid for both cgroup v1 and v2.
A workaround to this problem is setting the cpu_shared_set to be equal
to the cpuset set to the machine.slice cgroup. However, using the
workaround is cumbersome when it comes to automated deployment on a
heterogeneous fleet of hosts.
A better approach would be for Nova to check if there is a
machine.slice cgroup and if this cgroup has a defined cpuset. If there
is a defined machine.slice cpuset, then Nova should consider the
cpuset defined in the machine.slice cgroup for computing the list of
CPUs that can schedule vCPUs unless cpu_shared_set is defined. If the
machine.slice cpuset is empty or the machine.slice cgroup does not
exist at all, then consider all online CPUs as schedulable.
Reproduction
============
1. Deploy the Nova compute agent on a host with the default configuration.
2. Create a custom cgroup with an exclusive cpuset:
# mkdir -p /sys/fs/cgroup/cpuset/test-group1
# echo "1" > /sys/fs/cgroup/cpuset/test-group1/cpuset.cpus
# echo "1" > /sys/fs/cgroup/cpuset/test-group1/cpuset.cpu_exclusive
3. Spawn an instance on the target host
Expected result
===============
The instance should be spawned successfully.
Actual result
=============
The instance fails to spawn.
Environment
===========
1. OpenStack version: latest upstream
commit 932866d078cdec51ad654aa0626a635e65975b7f (HEAD -> master, origin/master, origin/HEAD)
Merge: 3d21445b73 26d174b65d
Author: Zuul <zuul@xxxxxxxxxxxxxxxxxx>
Date: Wed Jan 22 18:30:38 2025 +0000
Merge "Run nova-next without periodic cache healing"
2. Hypervisor: QEMU/KVM via libvirt
Compiled against library: libvirt 8.0.0
Using library: libvirt 8.0.0
Using API: QEMU 8.0.0
Running hypervisor: QEMU 6.2.0
3. No storage used, booted from an image
4. No networking used
Logs & Config
=============
An error excerpt from the nova-compute process in the attached files
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2095591/+subscriptions
References