← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1851545] [NEW] Port update exception on nova unshelve for instance with PCI devices (part 2)

 

Public bug reported:

Description
===========
When unshelving an instance with PCI devices, and another instance is already using the PCI device(s) that the unshelved instance was initially scheduled with, we get an exception.

Steps to reproduce
==================
- Create instance with SR-IOV
- Shelve instance
- Unshelve instance on a compute node with the same PCI device(s) already in use

Expected result
===============
We should recalculate the pci mapping to use new PCI device(s)

Actual result
=============
Nova compute fails with this traceback [a].

This analysis was made when testing with newton, but it's the same
problem with supported upstream, at least up to queens.

- When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1] 
- when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration.
- That brings us back to this commit [3] and this upstream bug [4]
- I would assume that if we remove the "migration is not None" test, we will fail with this bug [4] because we get the pci_mapping from a migration object

Now I'm not sure how to generate the pci_mapping without a migration
object/context.

[1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411
[2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418
[3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c
[4] https://bugs.launchpad.net/nova/+bug/1677621/


Logs & Configs
==============

[a]
~~~
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] Traceback (most recent call last):
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     block_device_info=block_device_info)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2742, in spawn
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     destroy_disks_on_failure=True)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5121, in _create_domain_and_network
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     destroy_disks_on_failure)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self.force_reraise()
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(self.type_, self.value, self.tb)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5093, in _create_domain_and_network
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     post_xml_callback=post_xml_callback)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5011, in _create_domain
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     guest.launch(pause=pause)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self._encoded_xml, errors='ignore')
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self.force_reraise()
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(self.type_, self.value, self.tb)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     return self._domain.createWithFlags(flags)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     result = proxy_call(self._autowrap, f, *args, **kwargs)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     rv = execute(f, *args, **kwargs)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(c, e, tb)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     rv = meth(*args, **kwargs)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] libvirtError: Requested operation is not valid: PCI device 0000:5d:17.6 is in use by driver QEMU, domain instance-000024b0
~~~

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1851545

Title:
  Port update exception on nova unshelve for instance with PCI devices
  (part 2)

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  When unshelving an instance with PCI devices, and another instance is already using the PCI device(s) that the unshelved instance was initially scheduled with, we get an exception.

  Steps to reproduce
  ==================
  - Create instance with SR-IOV
  - Shelve instance
  - Unshelve instance on a compute node with the same PCI device(s) already in use

  Expected result
  ===============
  We should recalculate the pci mapping to use new PCI device(s)

  Actual result
  =============
  Nova compute fails with this traceback [a].

  This analysis was made when testing with newton, but it's the same
  problem with supported upstream, at least up to queens.

  - When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1] 
  - when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration.
  - That brings us back to this commit [3] and this upstream bug [4]
  - I would assume that if we remove the "migration is not None" test, we will fail with this bug [4] because we get the pci_mapping from a migration object

  Now I'm not sure how to generate the pci_mapping without a migration
  object/context.

  [1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411
  [2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418
  [3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c
  [4] https://bugs.launchpad.net/nova/+bug/1677621/

  
  Logs & Configs
  ==============

  [a]
  ~~~
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] Traceback (most recent call last):
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     block_device_info=block_device_info)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2742, in spawn
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     destroy_disks_on_failure=True)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5121, in _create_domain_and_network
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     destroy_disks_on_failure)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self.force_reraise()
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(self.type_, self.value, self.tb)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5093, in _create_domain_and_network
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     post_xml_callback=post_xml_callback)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5011, in _create_domain
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     guest.launch(pause=pause)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self._encoded_xml, errors='ignore')
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     self.force_reraise()
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(self.type_, self.value, self.tb)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     return self._domain.createWithFlags(flags)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     rv = execute(f, *args, **kwargs)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     six.reraise(c, e, tb)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     rv = meth(*args, **kwargs)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] libvirtError: Requested operation is not valid: PCI device 0000:5d:17.6 is in use by driver QEMU, domain instance-000024b0
  ~~~

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1851545/+subscriptions


Follow ups