← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1821938] Re: No nova hypervisor can be enabled on workers with QAT devices

 

Reviewed:  https://review.openstack.org/649409
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e7ae6c65cd24fb3e0776fac80fbab2ab16e9d9ed
Submitter: Zuul
Branch:    master

commit e7ae6c65cd24fb3e0776fac80fbab2ab16e9d9ed
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date:   Tue Apr 2 18:27:24 2019 +0100

    Libvirt: gracefully handle non-nic VFs
    
    As part of adding support for bandwidth based scheduling
    I038867c4094d79ae4a20615ab9c9f9e38fcc2e0a introduced
    automatic discovery of parent netdev names for PCIe
    virtual functions.
    
    Nova's PCI passthrough support was originally developed for
    Intel QAT devices and other generic PCI devices. Later support
    for Neutron based SR-IOV NIC was added.
    
    The PCI-SIG SR-IOV specification while most often used by NIC
    vendors to virtualise a NIC in hardware was designed for devices
    of any PCIe class. Support for Intel's QAT device and other
    accelerators like AMD's SRIOV based vGPU have therefore been
    regressed by the introduction of the new parent_ifname lookup code.
    
    This change simply catches the exception that would be raised
    when pci_utils.get_ifname_by_pci_address is called on generic
    VFs allowing a graceful fallback to the previous behaviour.
    
    Change-Id: Ib3811f828246311d90b0e3ba71c162c03fb8fe5a
    Closes-Bug: #1821938


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821938

Title:
  No nova hypervisor can be enabled on workers with QAT devices

Status in OpenStack Compute (nova):
  Fix Released
Status in StarlingX:
  In Progress

Bug description:
  Brief Description
  -----------------
  Unable to enable a host as nova hypervisor due to pci device cannot be found if the host has QAT devices (C62x or DH895XCC) configured.

  Severity
  --------
  Major

  Steps to Reproduce
  ------------------
  - Install and configure a system where worker nodes have QAT devices configured. e.g.,
  [wrsroot@controller-0 ~(keystone_admin)]$ system host-device-list compute-0
  +------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
  | name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
  +------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
  | pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
  | pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
  +------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+

  compute-0:~$ lspci | grep QAT
  09:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
  09:01.0 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
  09:01.1 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
  ...

  - check nova hypervisor-list

  Expected Behavior
  ------------------
  - Nova hypervisors exist on system

  Actual Behavior
  ----------------
  [wrsroot@controller-0 ~(keystone_admin)]$ nova hypervisor-list
  +----+---------------------+-------+--------+
  | ID | Hypervisor hostname | State | Status |
  +----+---------------------+-------+--------+
  +----+---------------------+-------+--------+

  Reproducibility
  ---------------
  Reproducible

  System Configuration
  --------------------
  Any system type with QAT devices configured on worker node

  Branch/Pull Time/Commit
  -----------------------
  stx master as of 2019-03-18

  Last Pass
  --------------
  on f/stein branch in early feb

  Timestamp/Logs
  --------------
  # nova-compute pods are spewing errors so they can't register themselves properly as hypervisors:
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager [req-4f652d4c-da7e-4516-9baa-915265c3fdda - - - - -] Error updating resources for node compute-0.: PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager Traceback (most recent call last):
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 7956, in _update_available_resource_for_node
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager startup=startup)
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 727, in update_available_resource
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7098, in get_available_resource
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager self._get_pci_passthrough_devices()
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6102, in _get_pci_passthrough_devices
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6062, in _get_pcidev_info
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager device.update(_get_device_type(cfgdev, address))
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6021, in _get_device_type
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_address, pf_interface=True),
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=pci_addr)
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821938/+subscriptions