yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85984
[Bug 1851545] Re: Port update exception on nova unshelve for instance with PCI devices (part 2)
Reviewed: https://review.opendev.org/c/openstack/nova/+/784168
Committed: https://opendev.org/openstack/nova/commit/00f1d4757e503bb9807d7a8d7035c061a97db983
Submitter: "Zuul (22348)"
Branch: master
commit 00f1d4757e503bb9807d7a8d7035c061a97db983
Author: Artom Lifshitz <alifshit@xxxxxxxxxx>
Date: Wed Mar 31 16:57:35 2021 -0400
Update SRIOV port pci_slot when unshelving
There are a few things we need to do to make that work:
* Always set the PCIRequest's requester_id. Previously, this was only
done for resource requests. The requester_id is the port UUID, so we
can use that to correlate which port to update with which pci_slot
(in the case of multiple SRIOV ports per instance).
This has the side effect of making the fix work only for instances
created *after* this patch has been applied. It's not ideal, but
there does not appear to be a better way.
* Call setup_networks_on_host() within the instance_claim context.
This means the instance's pci_devices are updated when we call it,
allowing us to get the pci_slot information from them.
With the two previous changes in place, we can figure out the port's
new pci_slot in _update_port_binding_for_instance().
Closes: bug 1851545
Change-Id: Icfa8c1d6e84eab758af6223a2870078685584aaa
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1851545
Title:
Port update exception on nova unshelve for instance with PCI devices
(part 2)
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
When unshelving an instance with PCI devices, and another instance is already using the PCI device(s) that the unshelved instance was initially scheduled with, we get an exception.
Steps to reproduce
==================
- Create instance with SR-IOV
- Shelve instance
- Unshelve instance on a compute node with the same PCI device(s) already in use
Expected result
===============
We should recalculate the pci mapping to use new PCI device(s)
Actual result
=============
Nova compute fails with this traceback [a].
This analysis was made when testing with newton, but it's the same
problem with supported upstream, at least up to queens.
- When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1]
- when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration.
- That brings us back to this commit [3] and this upstream bug [4]
- I would assume that if we remove the "migration is not None" test, we will fail with this bug [4] because we get the pci_mapping from a migration object
Now I'm not sure how to generate the pci_mapping without a migration
object/context.
[1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411
[2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418
[3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c
[4] https://bugs.launchpad.net/nova/+bug/1677621/
Logs & Configs
==============
[a]
~~~
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] Traceback (most recent call last):
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] block_device_info=block_device_info)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2742, in spawn
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] destroy_disks_on_failure=True)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5121, in _create_domain_and_network
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] destroy_disks_on_failure)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] self.force_reraise()
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] six.reraise(self.type_, self.value, self.tb)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5093, in _create_domain_and_network
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] post_xml_callback=post_xml_callback)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5011, in _create_domain
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] guest.launch(pause=pause)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] self._encoded_xml, errors='ignore')
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] self.force_reraise()
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] six.reraise(self.type_, self.value, self.tb)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] return self._domain.createWithFlags(flags)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] result = proxy_call(self._autowrap, f, *args, **kwargs)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] rv = execute(f, *args, **kwargs)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] six.reraise(c, e, tb)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] rv = meth(*args, **kwargs)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] libvirtError: Requested operation is not valid: PCI device 0000:5d:17.6 is in use by driver QEMU, domain instance-000024b0
~~~
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1851545/+subscriptions
References