← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1751073] Re: [Regression] Nova's 'enabled_perf_events' feature will be broken with Linux Kernel 4.14+

 

Reviewed:  https://review.openstack.org/565242
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fc4794acc6b13afade1bb72a1ae9f574707d2f0d
Submitter: Zuul
Branch:    master

commit fc4794acc6b13afade1bb72a1ae9f574707d2f0d
Author: Kashyap Chamarthy <kchamart@xxxxxxxxxx>
Date:   Tue May 8 10:52:17 2018 +0200

    libvirt: Deprecate support for monitoring Intel CMT `perf` events
    
    Upstream Linux kernel has deleted[*] the `perf` framework integration
    with Intel CMT (Cache Monitoring Technology; or "CQM" in Linux kernel
    parlance), because the feature was broken by design -- an
    incompatibility between Linux's `perf` infrastructure and Intel CMT
    hardware support.  It was removed in upstream kernel version v4.14; but
    bear in mind that downstream Linux distributions with lower kernel
    versions than 4.14 have backported the said change.
    
    Nova supports monitoring of the above mentioned Intel CMT events
    (namely: 'cmt', 'mbm_local', and 'mbm_total') via the configuration
    attribute `[libvirt]/enabled_perf_events`. Given that the underlying
    Linux kernel infrastructure for Intel CMT is removed, we should remove
    support for it in Nova too.  Otherwise enabling them in Nova, and
    updating to a Linux kernel 4.14 (or above) will result in instances
    failing to boot.
    
    To that end, deprecate support for the three Intel CMT events in "Rocky"
    release, with the intention to remove support for it in the upcoming
    "Stein" release.  Note that we cannot deprecate / remove
    `enabled_perf_events` config attribute altogether -- since there are
    other[+] `perf` events besides Intel CMT.  Whether anyone is using those
    other events with Nova is a good question to which we don't have an
    equally good answer for, if at all.
    
    [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c39a0e2
    [+] https://libvirt.org/formatdomain.html#elementsPerf
    
    Closes-Bug: #1751073
    Change-Id: I7e77f87650d966d605807c7be184e670259a81c1
    Signed-off-by: Kashyap Chamarthy <kchamart@xxxxxxxxxx>


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1751073

Title:
  [Regression] Nova's 'enabled_perf_events' feature will be broken with
  Linux Kernel 4.14+

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Upstream Linux kernel has removed[*] the 'perf cqm' (Cache 
  Quality-of-Service Monitoring) from the following kernels onwards:

      [linux]$> git tag --contains c39a0e2 
      v4.14

  Impact for OpenStack / Nova
  ---------------------------

  Quoting the summary from Dan Berrangé from a downstream bug (with some
  edits, references and formatting):

    - Libvirt supports enabling perf event reporting per guest using <perf
      ../> XML in guest XML
      https://libvirt.org/formatdomain.html#elementsPerf

    - OpenStack has abiity to enable this support by using
      /etc/nova/nova.conf setting "enabled_perf_events" in [libvirt]
      section

    - Although libvirt supports many events, OpenStack only supports the
      'cmt', 'mbmt' and 'mbml' perf events

    - Upstream Linux kernel decided the perf framework integration with
      'cmt', 'mbmt' and 'mbml' events was broken by design and entirely
      deleted it[*]

    - Upstream kernel has provided a new approach to 'cmt', 'mbmt' and
      'mbml' info reporting that is *not* using perf framework

    - There's unlikely to be any way for libvirt to make this 
      functionality magically re-appear, given the kernel changes. The new
      approach is completely incompatible with what was done before.

  IOW, if someone has set "enabled_perf_events" in /etc/nova/nova.conf
  previously, they will be unable to start any guest, once they upgrade to
  any kernels that has backported the commit: c39a0e2 ("x86/perf/cqm: Wipe
  out perf based cqm")[*].

  
  [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c39a0e2

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1751073/+subscriptions


References