yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71915
[Bug 1758086] [NEW] nvidia driver limits to one single GPU per guest
Public bug reported:
If you want to provide a flavor with "resources:VGPU=2" (or more) and
have compute nodes using nvidia cards (ie. having PCI devices that have
a 16-bit vendor ID of "10de"), then QEMU throws an exception that is due
to the nvidia driver not supporting more than 1 IOMMU group per guest.
libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-22T13:14:39.272301Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/c949168d-d04d-4e74-925a-c38f3be11df5,bus=pci.0,addr=0x5: vfio warning: c949168d-d04d-4e74-925a-c38f3be11df5: Could not enable error recovery for the device
2018-03-22T13:14:39.273759Z qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f508c6d0-f859-4fa2-8976-94940e917709,bus=pci.0,addr=0x6: vfio error: f508c6d0-f859-4fa2-8976-94940e917709: error getting device from group 1: Operation not permitted
Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use
Accordingly to that limitation, Nova should limit the maximum unit of
possible resources per allocation depending on the PCI device vendor ID.
** Affects: nova
Importance: Low
Assignee: Sylvain Bauza (sylvain-bauza)
Status: Triaged
** Tags: placement vgpu
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1758086
Title:
nvidia driver limits to one single GPU per guest
Status in OpenStack Compute (nova):
Triaged
Bug description:
If you want to provide a flavor with "resources:VGPU=2" (or more) and
have compute nodes using nvidia cards (ie. having PCI devices that
have a 16-bit vendor ID of "10de"), then QEMU throws an exception that
is due to the nvidia driver not supporting more than 1 IOMMU group per
guest.
libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-22T13:14:39.272301Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/c949168d-d04d-4e74-925a-c38f3be11df5,bus=pci.0,addr=0x5: vfio warning: c949168d-d04d-4e74-925a-c38f3be11df5: Could not enable error recovery for the device
2018-03-22T13:14:39.273759Z qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f508c6d0-f859-4fa2-8976-94940e917709,bus=pci.0,addr=0x6: vfio error: f508c6d0-f859-4fa2-8976-94940e917709: error getting device from group 1: Operation not permitted
Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use
Accordingly to that limitation, Nova should limit the maximum unit of
possible resources per allocation depending on the PCI device vendor
ID.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1758086/+subscriptions