yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94484
[Bug 2077228] Re: libvirt reports powerd down CPUs as being on socket 0 regardless of their real socket
Reviewed: https://review.opendev.org/c/openstack/nova/+/926218
Committed: https://opendev.org/openstack/nova/commit/79d1f06094599249e6e30ebba2488b8b7a10834e
Submitter: "Zuul (22348)"
Branch: master
commit 79d1f06094599249e6e30ebba2488b8b7a10834e
Author: Artom Lifshitz <alifshit@xxxxxxxxxx>
Date: Tue Aug 13 11:29:10 2024 -0400
libvirt: call get_capabilities() with all CPUs online
While we do cache the hosts's capabilities in self._caps in the
libvirt Host object, if we happen to fist call get_capabilities() with
some of our dedicated CPUs offline, libvirt erroneously reports them
as being on socket 0 regardless of their real socket. We would then
cache that topology, thus breaking pretty much all of our NUMA
accounting.
To fix this, this patch makes sure to call get_capabilities()
immediately upon host init, and to power up all our dedicated CPUs
before doing so. That way, we cache their real socket ID.
For testing, because we don't really want to implement a libvirt bug
in our Python libvirt fixture, we make due with a simple unit tests
that asserts that init_host() has powered on the correct CPUs.
Closes-bug: 2077228
Change-Id: I9a2a7614313297f11a55d99fb94916d3583a9504
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2077228
Title:
libvirt reports powerd down CPUs as being on socket 0 regardless of
their real socket
Status in OpenStack Compute (nova):
Fix Released
Bug description:
This is more of a libvirt (or maybe even lower down in the kernel)
bug, but the consequence of $topic's reporting is that if libvirt CPU
power management is enabled, we mess up our NUMA accounting because we
have the wrong socket for some/all of our dedicated CPUs, depending on
whether they were online or not when we called get_capabilities().
Initially found by internal Red Hat testing, and reported here:
https://issues.redhat.com/browse/OSPRH-8712
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2077228/+subscriptions
References