← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1758086] [NEW] nvidia driver limits to one single GPU per guest

 

Public bug reported:

If you want to provide a flavor with "resources:VGPU=2" (or more) and
have compute nodes using nvidia cards (ie. having PCI devices that have
a 16-bit vendor ID of "10de"), then QEMU throws an exception that is due
to the nvidia driver not supporting more than 1 IOMMU group per guest.

libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-22T13:14:39.272301Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/c949168d-d04d-4e74-925a-c38f3be11df5,bus=pci.0,addr=0x5: vfio warning: c949168d-d04d-4e74-925a-c38f3be11df5: Could not enable error recovery for the device
2018-03-22T13:14:39.273759Z qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f508c6d0-f859-4fa2-8976-94940e917709,bus=pci.0,addr=0x6: vfio error: f508c6d0-f859-4fa2-8976-94940e917709: error getting device from group 1: Operation not permitted
Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use

Accordingly to that limitation, Nova should limit the maximum unit of
possible resources per allocation depending on the PCI device vendor ID.

** Affects: nova
     Importance: Low
     Assignee: Sylvain Bauza (sylvain-bauza)
         Status: Triaged


** Tags: placement vgpu

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1758086

Title:
  nvidia driver limits to one single GPU per guest

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  If you want to provide a flavor with "resources:VGPU=2" (or more) and
  have compute nodes using nvidia cards (ie. having PCI devices that
  have a 16-bit vendor ID of "10de"), then QEMU throws an exception that
  is due to the nvidia driver not supporting more than 1 IOMMU group per
  guest.

  libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-03-22T13:14:39.272301Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/c949168d-d04d-4e74-925a-c38f3be11df5,bus=pci.0,addr=0x5: vfio warning: c949168d-d04d-4e74-925a-c38f3be11df5: Could not enable error recovery for the device
  2018-03-22T13:14:39.273759Z qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f508c6d0-f859-4fa2-8976-94940e917709,bus=pci.0,addr=0x6: vfio error: f508c6d0-f859-4fa2-8976-94940e917709: error getting device from group 1: Operation not permitted
  Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use

  Accordingly to that limitation, Nova should limit the maximum unit of
  possible resources per allocation depending on the PCI device vendor
  ID.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1758086/+subscriptions