yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86454
[Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails
This bug was fixed in the package nova - 2:17.0.13-0ubuntu2
---------------
nova (2:17.0.13-0ubuntu2) bionic; urgency=medium
* d/control: Update VCS paths for move to lp:~ubuntu-openstack-dev.
* d/p/1892361-update-pci-stat-pools.patch: Cherry pick upstream fix
for SRIOV instances (LP: #1892361).
-- Chris MacNaughton <chris.macnaughton@xxxxxxxxxx> Thu, 08 Oct 2020
12:20:17 +0000
** Changed in: nova (Ubuntu Bionic)
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1892361
Title:
SRIOV instance gets type-PF interface, libvirt kvm fails
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive queens series:
Fix Committed
Status in Ubuntu Cloud Archive rocky series:
Fix Committed
Status in Ubuntu Cloud Archive stein series:
Fix Committed
Status in Ubuntu Cloud Archive train series:
Fix Released
Status in Ubuntu Cloud Archive ussuri series:
Fix Released
Status in Ubuntu Cloud Archive victoria series:
Fix Released
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
New
Status in OpenStack Compute (nova) rocky series:
New
Status in OpenStack Compute (nova) stein series:
Fix Committed
Status in OpenStack Compute (nova) train series:
Fix Released
Status in OpenStack Compute (nova) ussuri series:
Fix Released
Status in OpenStack Compute (nova) victoria series:
Fix Released
Status in nova package in Ubuntu:
Fix Released
Status in nova source package in Bionic:
Fix Released
Status in nova source package in Focal:
Fix Released
Status in nova source package in Groovy:
Fix Released
Status in nova source package in Hirsute:
Fix Released
Bug description:
When spawning an SR-IOV enabled instance on a newly deployed host,
nova attempts to spawn it with an type-PF pci device. This fails with
the below stack trace.
After restarting neutron-sriov-agent and nova-compute services on the
compute node and spawning an SR-IOV instance again, a type-VF pci
device is selected, and instance spawning succeeds.
Stack trace:
2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta
nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last):
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5620, in _create_domain_and_network
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] post_xml_callback=post_xml_callback)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5555, in _create_domain
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] guest.launch(pause=pause)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 144, in launch
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self._encoded_xml, errors='ignore')
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise()
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/guest.py", line 139, in launch
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] return self._domain.createWithFlags(flags)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = execute(f, *args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(c, e, tb)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] rv = meth(*args, **kwargs)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1092, in createWithFlags
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only
2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11]
2020-08-20 08:29:09.599 7624 INFO nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Terminating instance
To reproduce, bring up an instance with an SR-IOV port on a freshly
deployed compute:
+ openstack port create -f value -c id --network testinstance_net --vnic-type=direct --binding-profile type=dict --binding-profile physical_network=physnet2 testinstance_net-port
+ openstack server create --flavor ce6da933-adc3-4e5f-a688-63b037705729 --image a3580f59-a6c6-41f6-85fa-2fc7277492a1 --nic port-id=547cd89a-3f91-4646-84d9-c9559b497526 --availability-zone nova:foo-compute-host testinstance_vanilla_66016d81-bc32-4def-a7b3-a3a164ca5164
Observe that a PF is getting selected for the sriov nic.
From nova-compute.log:
<interface type='hostdev' managed='yes'>
<mac address='98:03:9b:61:22:e9'/>
<source>
<address type='pci' domain='0x0000' bus='0xd8' slot='0x00' function='0x1'/>
</source>
<vlan>
<tag id='48'/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
...
2020-08-20 08:29:09.056 7624 DEBUG nova.virt.libvirt.vif [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b]
vif_type=hw_veb ...
vif={"profile":
{"pci_slot": "0000:d8:00.1", "physical_network": "physnet2", "pci_vendor_info": "15b3:1015"},
"ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [],
"address": "192.168.0.5"}], "version": 4, "meta": {"dhcp_server": "192.168.0.2"}, "dns": [], "routes": [], "cidr": "192.168.0.0/29",
"gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.0.1"}}], "meta": {"injected": false, "tenant_id": "dd99e7950a5b46b5b924ccd1720b6257",
"physical_network": "physnet2", "mtu": 9000},
"id": "60b3001e-21c1-4947-8996-314449f614c060b3001e-21c1-4947-8996-314449f614c0", "label": "net_20Aug-1"}, "devname": "tapf3953098-98", "vnic_type": "direct", "qbh_params": null, "meta": {},
"details": {"port_filter": false, "vlan": "48"}, "address": "98:03:9b:61:22:e9", "active": false, "type": "hw_veb", "id": "f3953098-98f7-4dd1-8b31-11f51a5a760f", "qbg_params": null}
virt_type=kvm get_config /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:572
Device is a PF:
# lspci | grep d8:00.1
d8:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Also the nova pci_devices table has it's dev_type correctly listed:
mysql> select compute_nodes.host, pci_devices.created_at, compute_node_id, address, dev_type, status, pci_devices.dev_id from pci_devices join compute_nodes ON (compute_nodes.id = pci_devices.compute_node_id) where compute_nodes.host = 'foo-compute-host' and pci_devices.dev_type = 'type-PF';
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
| host | created_at | compute_node_id | address | dev_type | status | dev_id |
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 95 | 0000:d8:00.1 | type-PF | available | pci_0000_d8_00_1 |
+------------------+---------------------+-----------------+--------------+----------+-----------+------------------+
Restarting services:
# systemctl status neutron-sriov-agent.service
# systemctl restart neutron-sriov-agent.service
Spawning an instance again, it gets allocated a type-VF port (and
spawning succeeds):
<interface type='hostdev' managed='yes'>
<mac address='fa:16:3e:34:d2:99'/>
<source>
<address type='pci' domain='0x0000' bus='0xd8' slot='0x05' function='0x1'/>
</source>
<vlan>
<tag id='4'/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
# lspci | grep d8:05.1
d8:05.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
After spawning an instance, the PF get marked as "unavailable" in the
nova db:
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
| host | created_at | updated_at | instance_uuid | compute_node_id | address | dev_type | status | dev_id |
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:45:07 | NULL | 95 | 0000:19:00.1 | type-PF | available | pci_0000_19_00_1 |
| foo-compute-host | 2020-08-12 17:10:19 | 2020-08-20 11:46:30 | NULL | 95 | 0000:d8:00.1 | type-PF | unavailable | pci_0000_d8_00_1 |
+------------------+---------------------+---------------------+---------------+-----------------+--------------+----------+-------------+------------------+
Software versions:
# dpkg -l | grep nova-common
ii nova-common 2:17.0.12-0ubuntu1 all OpenStack Compute - common files
# dpkg -l | grep libvirt0
ii libvirt0:amd64 4.0.0-1ubuntu8.17 amd64 library for interfacing with different virtualization systems
# lsb_release -r
Release: 18.04
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[Impact]
Spawning an SR-IOV instance fails on a newly deployed compute.
Nova attempts to spawn a PCI device of type type-PCI instead of type-VF.
This was happened in OpenStack Queens deployment.
[Test case]
1. Issue can be reproduced by following steps in comment #3
https://bugs.launchpad.net/nova/+bug/1892361/comments/3
2. Install the package with fixed code
3. Confirm bug have been fixed
Repeat the steps mentioned in comment #3 and check if the instance with sriov port is created successfully.
[Where problems could occur]
Upstream CI ran all the functional test cases that triggers this scenario.
Installation of new package will result in restart of nova-compute service.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1892361/+subscriptions
References