yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77832
[Bug 1821938] Re: No nova hypervisor can be enabled on workers with QAT devices
Reviewed: https://review.openstack.org/649409
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e7ae6c65cd24fb3e0776fac80fbab2ab16e9d9ed
Submitter: Zuul
Branch: master
commit e7ae6c65cd24fb3e0776fac80fbab2ab16e9d9ed
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date: Tue Apr 2 18:27:24 2019 +0100
Libvirt: gracefully handle non-nic VFs
As part of adding support for bandwidth based scheduling
I038867c4094d79ae4a20615ab9c9f9e38fcc2e0a introduced
automatic discovery of parent netdev names for PCIe
virtual functions.
Nova's PCI passthrough support was originally developed for
Intel QAT devices and other generic PCI devices. Later support
for Neutron based SR-IOV NIC was added.
The PCI-SIG SR-IOV specification while most often used by NIC
vendors to virtualise a NIC in hardware was designed for devices
of any PCIe class. Support for Intel's QAT device and other
accelerators like AMD's SRIOV based vGPU have therefore been
regressed by the introduction of the new parent_ifname lookup code.
This change simply catches the exception that would be raised
when pci_utils.get_ifname_by_pci_address is called on generic
VFs allowing a graceful fallback to the previous behaviour.
Change-Id: Ib3811f828246311d90b0e3ba71c162c03fb8fe5a
Closes-Bug: #1821938
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821938
Title:
No nova hypervisor can be enabled on workers with QAT devices
Status in OpenStack Compute (nova):
Fix Released
Status in StarlingX:
In Progress
Bug description:
Brief Description
-----------------
Unable to enable a host as nova hypervisor due to pci device cannot be found if the host has QAT devices (C62x or DH895XCC) configured.
Severity
--------
Major
Steps to Reproduce
------------------
- Install and configure a system where worker nodes have QAT devices configured. e.g.,
[wrsroot@controller-0 ~(keystone_admin)]$ system host-device-list compute-0
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
compute-0:~$ lspci | grep QAT
09:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
09:01.0 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
09:01.1 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
...
- check nova hypervisor-list
Expected Behavior
------------------
- Nova hypervisors exist on system
Actual Behavior
----------------
[wrsroot@controller-0 ~(keystone_admin)]$ nova hypervisor-list
+----+---------------------+-------+--------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------+-------+--------+
+----+---------------------+-------+--------+
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Any system type with QAT devices configured on worker node
Branch/Pull Time/Commit
-----------------------
stx master as of 2019-03-18
Last Pass
--------------
on f/stein branch in early feb
Timestamp/Logs
--------------
# nova-compute pods are spewing errors so they can't register themselves properly as hypervisors:
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager [req-4f652d4c-da7e-4516-9baa-915265c3fdda - - - - -] Error updating resources for node compute-0.: PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager Traceback (most recent call last):
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 7956, in _update_available_resource_for_node
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager startup=startup)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 727, in update_available_resource
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7098, in get_available_resource
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager self._get_pci_passthrough_devices()
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6102, in _get_pci_passthrough_devices
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6062, in _get_pcidev_info
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager device.update(_get_device_type(cfgdev, address))
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6021, in _get_device_type
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_address, pf_interface=True),
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=pci_addr)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821938/+subscriptions