← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2003253] Re: Nova fails to SRIOV VM with error "libvirtError: Requested operation is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain instance-....."

 

[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2003253

Title:
  Nova fails to SRIOV VM with error "libvirtError: Requested operation
  is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain
  instance-....."

Status in OpenStack Compute (nova):
  Expired

Bug description:
  On two SRIOV computes, with Mellanox ConextX-5 NIC, we can create SRIOV VMs with no problems.
  When we create several of these SRIOV VMs and start live migrate these VMs at some point we hit below error:

  2023-01-17 08:09:04.413 7 INFO nova.virt.libvirt.driver [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] Attaching vif 26ab618c-186b-402e-b8d1-0c0f9e57d8cf to instance 37
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] [instance: dc84de60-274b-4694-b73b-9aa237d9561b] attaching network adapter failed.: libvirtError: Requested operation is not valid: PC
  I device 0000:5e:05.6 is in use by driver QEMU, domain instance-0000002e
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] Traceback (most recent call last):
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2139, in attach_interface
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     guest.attach_device(cfg, persistent=True, live=live)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 305, in attach_device
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     self._domain.attachDeviceFlags(device_xml, flags=flags)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 190, in doit
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 148, in proxy_call
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     rv = execute(f, *args, **kwargs)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 129, in execute
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     six.reraise(c, e, tb)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     rv = meth(*args, **kwargs)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 605, in attachDeviceFlags
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]     if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] libvirtError: Requested operation is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain instance-0000002e
  2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] 
  2023-01-17 08:09:04.437 7 ERROR nova.compute.manager [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] [instance: dc84de60-274b-4694-b73b-9aa237d9561b] Unexpected error during post live migration at destination host.: InterfaceAttachFailed: 
  Failed to attach network adapter device to dc84de60-274b-4694-b73b-9aa237d9561b

  What seems to be happening is that on two different SRIOV computes,
  virtual functions with the same PCI address are in used by two VMs.
  When we migrate one of the VMs to the second compute where the same
  PCI virtual function is used we run into above PCI device conflict.

  The end result is pretty bad, on the target compute
  1. There is still running the VM which was originally running there
  2. Migrated VM, the libvirt domain is running but it does not have the NIC based on virtual function connected, no connectivity

  
  In general the problem seems to be that Nova does not check for the PCI devices (SRIOV virtual functions) to be unique across all SRIOV capable computes and more than one VM can get PCI devices with the same, conflicting address.

  There is another bug with similar problem
  https://bugs.launchpad.net/nova/+bug/1633120 but it seems to be
  different problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2003253/+subscriptions



References