yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85122
[Bug 1915255] [NEW] [Victoria] nova-compute won't start on aarch64 - raises PciDeviceNotFoundById
Public bug reported:
Description
===========
When deploying OpenStack Victoria on Ubuntu 20.04 (Focal) on
arm64/aarch64, nova-compute 22.0.1 fails to start with (nova-
compute.log):
----------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/pci/utils.py", line 156, in get_ifname_by_pci_address
dev_info = os.listdir(dev_path)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/bus/pci/devices/0002:01:00.1/physfn/net'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 9823, in _update_available_resource_for_node
self.rt.update_available_resource(context, nodename,
File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 880, in update_available_resource
resources = self.driver.get_available_resource(nodename)
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8473, in get_available_resource
data['pci_passthrough_devices'] = self._get_pci_passthrough_devices()
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7223, in _get_pci_passthrough_devices
pci_info = [self._get_pcidev_info(name, dev, net_devs) for name, dev
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7223, in <listcomp>
pci_info = [self._get_pcidev_info(name, dev, net_devs) for name, dev
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7199, in _get_pcidev_info
device.update(_get_device_type(cfgdev, address, dev, net_devs))
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7154, in _get_device_type
parent_ifname = pci_utils.get_ifname_by_pci_address(
File "/usr/lib/python3/dist-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
raise exception.PciDeviceNotFoundById(id=pci_addr)
nova.exception.PciDeviceNotFoundById: PCI device 0002:01:00.1 not found
----------
This results in an empty `openstack hypervisor list`.
This does not happen with OpenStack Ussuri (nova-compute 21.1.0). We
also haven't seen this on other architectures (yet?). This code actually
appeared between Ussuri and Victoria, [0] i.e. the first version having
it is 22.0.0.
$ lspci | grep 0002:01:00.1
0002:01:00.1 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
Indeed /sys/bus/pci/devices/0002:01:00.1/physfn/ doesn't contain `net`
but I'm not sure if that's really a problem or if nova-compute should
just catch the exception and move on?
A similar issue in the past [1] shows that this might be an issue
specific to the Cavium Thunder X NIC.
Related issue: [2]
Steps to reproduce
==================
Install and run nova >= 22.0.0 on an aarch64 machine (with a Cavium
Thunder X NIC if possible). I personally use Juju [3] for deploying an
entire OpenStack Victoria setup to a lab:
$ git clone https://github.com/openstack-charmers/openstack-bundles
$ cd openstack-bundles/development/openstack-base-focal-victoria/
$ juju deploy ./bundle.yaml
Expected result
===============
`openstack hypervisor list` shows at least one hypervisor.
nova-compute.log doesn't contain nova.exception.PciDeviceNotFoundById
Actual result
=============
`openstack hypervisor list` doesn't show any hypervisor.
nova-compute.log contains nova.exception.PciDeviceNotFoundById
Environment
===========
$ dpkg -l | grep nova
ii nova-api-metadata 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - metadata API frontend
ii nova-common 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python3-nova 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.2.1-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
# cat /etc/nova/nova-compute.conf
[DEFAULT]
compute_driver=libvirt.LibvirtDriver
[libvirt]
virt_type=kvm
$ dpkg -l | grep libvirt
ii libvirt-clients 6.0.0-0ubuntu8.5 arm64 Programs for the libvirt library
ii libvirt-daemon 6.0.0-0ubuntu8.5 arm64 Virtualization daemon
ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.5 arm64 Virtualization daemon QEMU connection driver
ii libvirt-daemon-driver-storage-rbd 6.0.0-0ubuntu8.5 arm64 Virtualization daemon RBD storage driver
ii libvirt-daemon-system 6.0.0-0ubuntu8.5 arm64 Libvirt daemon configuration files
ii libvirt-daemon-system-systemd 6.0.0-0ubuntu8.5 arm64 Libvirt daemon configuration files (systemd)
ii libvirt0:arm64 6.0.0-0ubuntu8.5 arm64 library for interfacing with different virtualization systems
ii nova-compute-libvirt 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python3-libvirt 6.1.0-1 arm64 libvirt Python 3 bindings
This shouldn't be relevant but:
* Ceph 15.2.7 for storage
* Neutron with OVN
Logs & Configs
==============
sosreport attached.
[0] https://opendev.org/openstack/nova/commit/efc27ff84c3
[1] https://bugs.launchpad.net/charm-nova-compute/+bug/1771662
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1724999
[3] https://jaas.ai/openstack-base
** Affects: nova
Importance: Undecided
Status: New
** Attachment added: "sosreport-node-egede-2021-02-10-lzrvidh.tar.xz"
https://bugs.launchpad.net/bugs/1915255/+attachment/5462220/+files/sosreport-node-egede-2021-02-10-lzrvidh.tar.xz
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1915255
Title:
[Victoria] nova-compute won't start on aarch64 - raises
PciDeviceNotFoundById
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
When deploying OpenStack Victoria on Ubuntu 20.04 (Focal) on
arm64/aarch64, nova-compute 22.0.1 fails to start with (nova-
compute.log):
----------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/pci/utils.py", line 156, in get_ifname_by_pci_address
dev_info = os.listdir(dev_path)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/bus/pci/devices/0002:01:00.1/physfn/net'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 9823, in _update_available_resource_for_node
self.rt.update_available_resource(context, nodename,
File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 880, in update_available_resource
resources = self.driver.get_available_resource(nodename)
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8473, in get_available_resource
data['pci_passthrough_devices'] = self._get_pci_passthrough_devices()
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7223, in _get_pci_passthrough_devices
pci_info = [self._get_pcidev_info(name, dev, net_devs) for name, dev
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7223, in <listcomp>
pci_info = [self._get_pcidev_info(name, dev, net_devs) for name, dev
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7199, in _get_pcidev_info
device.update(_get_device_type(cfgdev, address, dev, net_devs))
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7154, in _get_device_type
parent_ifname = pci_utils.get_ifname_by_pci_address(
File "/usr/lib/python3/dist-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address
raise exception.PciDeviceNotFoundById(id=pci_addr)
nova.exception.PciDeviceNotFoundById: PCI device 0002:01:00.1 not found
----------
This results in an empty `openstack hypervisor list`.
This does not happen with OpenStack Ussuri (nova-compute 21.1.0). We
also haven't seen this on other architectures (yet?). This code
actually appeared between Ussuri and Victoria, [0] i.e. the first
version having it is 22.0.0.
$ lspci | grep 0002:01:00.1
0002:01:00.1 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09)
Indeed /sys/bus/pci/devices/0002:01:00.1/physfn/ doesn't contain `net`
but I'm not sure if that's really a problem or if nova-compute should
just catch the exception and move on?
A similar issue in the past [1] shows that this might be an issue
specific to the Cavium Thunder X NIC.
Related issue: [2]
Steps to reproduce
==================
Install and run nova >= 22.0.0 on an aarch64 machine (with a Cavium
Thunder X NIC if possible). I personally use Juju [3] for deploying an
entire OpenStack Victoria setup to a lab:
$ git clone https://github.com/openstack-charmers/openstack-bundles
$ cd openstack-bundles/development/openstack-base-focal-victoria/
$ juju deploy ./bundle.yaml
Expected result
===============
`openstack hypervisor list` shows at least one hypervisor.
nova-compute.log doesn't contain nova.exception.PciDeviceNotFoundById
Actual result
=============
`openstack hypervisor list` doesn't show any hypervisor.
nova-compute.log contains nova.exception.PciDeviceNotFoundById
Environment
===========
$ dpkg -l | grep nova
ii nova-api-metadata 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - metadata API frontend
ii nova-common 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-compute 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python3-nova 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.2.1-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
# cat /etc/nova/nova-compute.conf
[DEFAULT]
compute_driver=libvirt.LibvirtDriver
[libvirt]
virt_type=kvm
$ dpkg -l | grep libvirt
ii libvirt-clients 6.0.0-0ubuntu8.5 arm64 Programs for the libvirt library
ii libvirt-daemon 6.0.0-0ubuntu8.5 arm64 Virtualization daemon
ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.5 arm64 Virtualization daemon QEMU connection driver
ii libvirt-daemon-driver-storage-rbd 6.0.0-0ubuntu8.5 arm64 Virtualization daemon RBD storage driver
ii libvirt-daemon-system 6.0.0-0ubuntu8.5 arm64 Libvirt daemon configuration files
ii libvirt-daemon-system-systemd 6.0.0-0ubuntu8.5 arm64 Libvirt daemon configuration files (systemd)
ii libvirt0:arm64 6.0.0-0ubuntu8.5 arm64 library for interfacing with different virtualization systems
ii nova-compute-libvirt 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support
ii python3-libvirt 6.1.0-1 arm64 libvirt Python 3 bindings
This shouldn't be relevant but:
* Ceph 15.2.7 for storage
* Neutron with OVN
Logs & Configs
==============
sosreport attached.
[0] https://opendev.org/openstack/nova/commit/efc27ff84c3
[1] https://bugs.launchpad.net/charm-nova-compute/+bug/1771662
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1724999
[3] https://jaas.ai/openstack-base
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1915255/+subscriptions
Follow ups