← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1851545] Re: Port update exception on nova unshelve for instance with PCI devices (part 2)

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/784168
Committed: https://opendev.org/openstack/nova/commit/00f1d4757e503bb9807d7a8d7035c061a97db983
Submitter: "Zuul (22348)"
Branch:    master

commit 00f1d4757e503bb9807d7a8d7035c061a97db983
Author: Artom Lifshitz <alifshit@xxxxxxxxxx>
Date:   Wed Mar 31 16:57:35 2021 -0400

    Update SRIOV port pci_slot when unshelving
    
    There are a few things we need to do to make that work:
    
    * Always set the PCIRequest's requester_id. Previously, this was only
      done for resource requests. The requester_id is the port UUID, so we
      can use that to correlate which port to update with which pci_slot
      (in the case of multiple SRIOV ports per instance).
    
      This has the side effect of making the fix work only for instances
      created *after* this patch has been applied. It's not ideal, but
      there does not appear to be a better way.
    
    * Call setup_networks_on_host() within the instance_claim context.
      This means the instance's pci_devices are updated when we call it,
      allowing us to get the pci_slot information from them.
    
    With the two previous changes in place, we can figure out the port's
    new pci_slot in _update_port_binding_for_instance().
    
    Closes: bug 1851545
    Change-Id: Icfa8c1d6e84eab758af6223a2870078685584aaa


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1851545

Title:
  Port update exception on nova unshelve for instance with PCI devices
  (part 2)

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========
  When unshelving an instance with PCI devices, and another instance is already using the PCI device(s) that the unshelved instance was initially scheduled with, we get an exception.

  Steps to reproduce
  ==================
  - Create instance with SR-IOV
  - Shelve instance
  - Unshelve instance on a compute node with the same PCI device(s) already in use

  Expected result
  ===============
  We should recalculate the pci mapping to use new PCI device(s)

  Actual result
  =============
  Nova compute fails with this traceback [a].

  This analysis was made when testing with newton, but it's the same
  problem with supported upstream, at least up to queens.

  - When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1] 
  - when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration.
  - That brings us back to this commit [3] and this upstream bug [4]
  - I would assume that if we remove the "migration is not None" test, we will fail with this bug [4] because we get the pci_mapping from a migration object

  Now I'm not sure how to generate the pci_mapping without a migration
  object/context.

  [1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411
  [2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418
  [3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c
  [4] https://bugs.launchpad.net/nova/+bug/1677621/

  
  Logs & Configs
  ==============

  [a]
  ~~~
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] Traceback (most recent call last):
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     block_device_info=block_device_info)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2742, in spawn
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     destroy_disks_on_failure=True)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5121, in _create_domain_and_network
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     destroy_disks_on_failure)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self.force_reraise()
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(self.type_, self.value, self.tb)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5093, in _create_domain_and_network
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     post_xml_callback=post_xml_callback)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5011, in _create_domain
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     guest.launch(pause=pause)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self._encoded_xml, errors='ignore')
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self.force_reraise()
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(self.type_, self.value, self.tb)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     return self._domain.createWithFlags(flags)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     rv = execute(f, *args, **kwargs)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(c, e, tb)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     rv = meth(*args, **kwargs)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] libvirtError: Requested operation is not valid: PCI device 0000:5d:17.6 is in use by driver QEMU, domain instance-000024b0
  ~~~

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1851545/+subscriptions


References