← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1821938] Re: No nova hypervisor can be enabled on workers with QAT devices

 

** Also affects: nova
   Importance: Undecided
       Status: New

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
     Assignee: (unassigned) => sean mooney (sean-k-mooney)

** Changed in: nova
       Status: New => In Progress

** Tags added: stein-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821938

Title:
  No nova hypervisor can be enabled on workers with QAT devices

Status in OpenStack Compute (nova):
  In Progress
Status in StarlingX:
  Triaged

Bug description:
  Brief Description
  -----------------
  Unable to enable a host as nova hypervisor due to pci device cannot be found if the host has QAT devices (C62x or DH895XCC) configured.

  Severity
  --------
  Major

  
  Steps to Reproduce
  ------------------
  - Install and configure a system where worker nodes have QAT devices configured. e.g.,
  [wrsroot@controller-0 ~(keystone_admin)]$ system host-device-list compute-0
  +------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
  | name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
  +------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
  | pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
  | pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
  +------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+

  compute-0:~$ lspci | grep QAT
  09:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
  09:01.0 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
  09:01.1 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
  ...

  - check nova hypervisor-list

  Expected Behavior
  ------------------
  - Nova hypervisors exist on system

  Actual Behavior
  ----------------
  [wrsroot@controller-0 ~(keystone_admin)]$ nova hypervisor-list
  +----+---------------------+-------+--------+
  | ID | Hypervisor hostname | State | Status |
  +----+---------------------+-------+--------+
  +----+---------------------+-------+--------+

  
  Reproducibility
  ---------------
  Reproducible

  System Configuration
  --------------------
  Any system type with QAT devices configured on worker node

  Branch/Pull Time/Commit
  -----------------------
  master as of 2019-03-18

  Last Pass
  --------------
  on f/stein branch in early feb

  Timestamp/Logs
  --------------
  # nova-compute pods are spewing errors so they can't register themselves properly as hypervisors:
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager [req-4f652d4c-da7e-4516-9baa-915265c3fdda - - - - -] Error updating resources for node compute-0.: PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager Traceback (most recent call last):
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 7956, in _update_available_resource_for_node
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager startup=startup)
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 727, in update_available_resource
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7098, in get_available_resource
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager self._get_pci_passthrough_devices()
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6102, in _get_pci_passthrough_devices
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6062, in _get_pcidev_info
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager device.update(_get_device_type(cfgdev, address))
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6021, in _get_device_type
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_address, pf_interface=True),
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=pci_addr)
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
  2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821938/+subscriptions