yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91060
[Bug 2003253] [NEW] Nova fails to SRIOV VM with error "libvirtError: Requested operation is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain instance-....."
Public bug reported:
On two SRIOV computes, with Mellanox ConextX-5 NIC, we can create SRIOV VMs with no problems.
When we create several of these SRIOV VMs and start live migrate these VMs at some point we hit below error:
2023-01-17 08:09:04.413 7 INFO nova.virt.libvirt.driver [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] Attaching vif 26ab618c-186b-402e-b8d1-0c0f9e57d8cf to instance 37
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] [instance: dc84de60-274b-4694-b73b-9aa237d9561b] attaching network adapter failed.: libvirtError: Requested operation is not valid: PC
I device 0000:5e:05.6 is in use by driver QEMU, domain instance-0000002e
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] Traceback (most recent call last):
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2139, in attach_interface
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] guest.attach_device(cfg, persistent=True, live=live)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 305, in attach_device
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] self._domain.attachDeviceFlags(device_xml, flags=flags)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 190, in doit
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] result = proxy_call(self._autowrap, f, *args, **kwargs)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 148, in proxy_call
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] rv = execute(f, *args, **kwargs)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 129, in execute
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] six.reraise(c, e, tb)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] rv = meth(*args, **kwargs)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 605, in attachDeviceFlags
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] libvirtError: Requested operation is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain instance-0000002e
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]
2023-01-17 08:09:04.437 7 ERROR nova.compute.manager [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] [instance: dc84de60-274b-4694-b73b-9aa237d9561b] Unexpected error during post live migration at destination host.: InterfaceAttachFailed:
Failed to attach network adapter device to dc84de60-274b-4694-b73b-9aa237d9561b
What seems to be happening is that on two different SRIOV computes,
virtual functions with the same PCI address are in used by two VMs. When
we migrate one of the VMs to the second compute where the same PCI
virtual function is used we run into above PCI device conflict.
The end result is pretty bad, on the target compute
1. There is still running the VM which was originally running there
2. Migrated VM, the libvirt domain is running but it does not have the NIC based on virtual function connected, no connectivity
In general the problem seems to be that Nova does not check for the PCI devices (SRIOV virtual functions) to be unique across all SRIOV capable computes and more than one VM can get PCI devices with the same, conflicting address.
There is another bug with similar problem
https://bugs.launchpad.net/nova/+bug/1633120 but it seems to be
different problem.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2003253
Title:
Nova fails to SRIOV VM with error "libvirtError: Requested operation
is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain
instance-....."
Status in OpenStack Compute (nova):
New
Bug description:
On two SRIOV computes, with Mellanox ConextX-5 NIC, we can create SRIOV VMs with no problems.
When we create several of these SRIOV VMs and start live migrate these VMs at some point we hit below error:
2023-01-17 08:09:04.413 7 INFO nova.virt.libvirt.driver [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] Attaching vif 26ab618c-186b-402e-b8d1-0c0f9e57d8cf to instance 37
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] [instance: dc84de60-274b-4694-b73b-9aa237d9561b] attaching network adapter failed.: libvirtError: Requested operation is not valid: PC
I device 0000:5e:05.6 is in use by driver QEMU, domain instance-0000002e
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] Traceback (most recent call last):
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2139, in attach_interface
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] guest.attach_device(cfg, persistent=True, live=live)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 305, in attach_device
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] self._domain.attachDeviceFlags(device_xml, flags=flags)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 190, in doit
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] result = proxy_call(self._autowrap, f, *args, **kwargs)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 148, in proxy_call
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] rv = execute(f, *args, **kwargs)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 129, in execute
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] six.reraise(c, e, tb)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] rv = meth(*args, **kwargs)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 605, in attachDeviceFlags
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b] libvirtError: Requested operation is not valid: PCI device 0000:5e:05.6 is in use by driver QEMU, domain instance-0000002e
2023-01-17 08:09:04.433 7 ERROR nova.virt.libvirt.driver [instance: dc84de60-274b-4694-b73b-9aa237d9561b]
2023-01-17 08:09:04.437 7 ERROR nova.compute.manager [req-f128d0fc-fab7-43e0-b5c3-7d039ed3122c 7280f3f5a7cd430f9ab5310b3e8acb27 6e24c3394ab14ec2823d991ff3bd4371 - default default] [instance: dc84de60-274b-4694-b73b-9aa237d9561b] Unexpected error during post live migration at destination host.: InterfaceAttachFailed:
Failed to attach network adapter device to dc84de60-274b-4694-b73b-9aa237d9561b
What seems to be happening is that on two different SRIOV computes,
virtual functions with the same PCI address are in used by two VMs.
When we migrate one of the VMs to the second compute where the same
PCI virtual function is used we run into above PCI device conflict.
The end result is pretty bad, on the target compute
1. There is still running the VM which was originally running there
2. Migrated VM, the libvirt domain is running but it does not have the NIC based on virtual function connected, no connectivity
In general the problem seems to be that Nova does not check for the PCI devices (SRIOV virtual functions) to be unique across all SRIOV capable computes and more than one VM can get PCI devices with the same, conflicting address.
There is another bug with similar problem
https://bugs.launchpad.net/nova/+bug/1633120 but it seems to be
different problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2003253/+subscriptions
Follow ups