yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #77805
[Bug 1821938] Re: No nova hypervisor can be enabled on workers with QAT devices
** Also affects: nova
Importance: Undecided
Status: New
** Changed in: nova
Importance: Undecided => High
** Changed in: nova
Assignee: (unassigned) => sean mooney (sean-k-mooney)
** Changed in: nova
Status: New => In Progress
** Tags added: stein-rc-potential
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821938
Title:
No nova hypervisor can be enabled on workers with QAT devices
Status in OpenStack Compute (nova):
In Progress
Status in StarlingX:
Triaged
Bug description:
Brief Description
-----------------
Unable to enable a host as nova hypervisor due to pci device cannot be found if the host has QAT devices (C62x or DH895XCC) configured.
Severity
--------
Major
Steps to Reproduce
------------------
- Install and configure a system where worker nodes have QAT devices configured. e.g.,
[wrsroot@controller-0 ~(keystone_admin)]$ system host-device-list compute-0
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
compute-0:~$ lspci | grep QAT
09:00.0 Co-processor: Intel Corporation DH895XCC Series QAT
09:01.0 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
09:01.1 Co-processor: Intel Corporation DH895XCC Series QAT Virtual Function
...
- check nova hypervisor-list
Expected Behavior
------------------
- Nova hypervisors exist on system
Actual Behavior
----------------
[wrsroot@controller-0 ~(keystone_admin)]$ nova hypervisor-list
+----+---------------------+-------+--------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------+-------+--------+
+----+---------------------+-------+--------+
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Any system type with QAT devices configured on worker node
Branch/Pull Time/Commit
-----------------------
master as of 2019-03-18
Last Pass
--------------
on f/stein branch in early feb
Timestamp/Logs
--------------
# nova-compute pods are spewing errors so they can't register themselves properly as hypervisors:
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager [req-4f652d4c-da7e-4516-9baa-915265c3fdda - - - - -] Error updating resources for node compute-0.: PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager Traceback (most recent call last):
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 7956, in _update_available_resource_for_node
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager startup=startup)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 727, in update_available_resource
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7098, in get_available_resource
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager self._get_pci_passthrough_devices()
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6102, in _get_pci_passthrough_devices
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_info.append(self._get_pcidev_info(name))
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6062, in _get_pcidev_info
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager device.update(_get_device_type(cfgdev, address))
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 6021, in _get_device_type
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager pci_address, pf_interface=True),
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager File "/var/lib/openstack/lib/python2.7/site-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=pci_addr)
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device 0000:09:02.3 not found
2019-03-25 18:46:49,899.899 62394 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821938/+subscriptions