← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2077228] Re: libvirt reports powerd down CPUs as being on socket 0 regardless of their real socket

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/926218
Committed: https://opendev.org/openstack/nova/commit/79d1f06094599249e6e30ebba2488b8b7a10834e
Submitter: "Zuul (22348)"
Branch:    master

commit 79d1f06094599249e6e30ebba2488b8b7a10834e
Author: Artom Lifshitz <alifshit@xxxxxxxxxx>
Date:   Tue Aug 13 11:29:10 2024 -0400

    libvirt: call get_capabilities() with all CPUs online
    
    While we do cache the hosts's capabilities in self._caps in the
    libvirt Host object, if we happen to fist call get_capabilities() with
    some of our dedicated CPUs offline, libvirt erroneously reports them
    as being on socket 0 regardless of their real socket. We would then
    cache that topology, thus breaking pretty much all of our NUMA
    accounting.
    
    To fix this, this patch makes sure to call get_capabilities()
    immediately upon host init, and to power up all our dedicated CPUs
    before doing so. That way, we cache their real socket ID.
    
    For testing, because we don't really want to implement a libvirt bug
    in our Python libvirt fixture, we make due with a simple unit tests
    that asserts that init_host() has powered on the correct CPUs.
    
    Closes-bug: 2077228
    Change-Id: I9a2a7614313297f11a55d99fb94916d3583a9504


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2077228

Title:
  libvirt reports powerd down CPUs as being on socket 0 regardless of
  their real socket

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  This is more of a libvirt (or maybe even lower down in the kernel)
  bug, but the consequence of $topic's reporting is that if libvirt CPU
  power management is enabled, we mess up our NUMA accounting because we
  have the wrong socket for some/all of our dedicated CPUs, depending on
  whether they were online or not when we called get_capabilities().

  Initially found by internal Red Hat testing, and reported here:
  https://issues.redhat.com/browse/OSPRH-8712

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2077228/+subscriptions



References