yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1628168] [NEW] Can't assign system with multiple GPUs to different VMs

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Kevin <kvasko@xxxxxxxxx>
Date: Tue, 27 Sep 2016 15:35:23 -0000
Reply-to: Bug 1628168 <1628168@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

I have an OS Mitaka deployment that was done by Fuel (9.0).

I have a system with 8GPUs in a single box. We are trying to allow VMs
to request access to GPU resources via this box.

I know that with PCI Passthrough you can only have a device assigned to
a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs (8
separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU -> 1VM)
* 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.

I have successfully been able to get the system to have 1 GPU <-> 1 VM,
however when I go to create another VM with a GPU I get "not enough
hosts found".

This is what I have done so far.

/etc/nova/nova.conf

Add:
 Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]

sudo gedit /etc/modules and add:
 pci_stub
 vfio
 vfio_iommu_type1
 vfio_pci
 kvm
 kvm_intel

Sudo vi /etc/default/grub
 GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"

//BLACKLIST

sudo gedit /etc/initramfs-tools/modules
 pci_stub ids=10de:17c2
 sudo update-initramfs -u

On Controller Node:

Edit nova.conf

Add specifically for GPU you want to use!

pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
 Add

scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
 scheduler_available_filters=nova.scheduler.filters.all_filters
 scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
 scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter

#: source openrc
 Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"

Actual Results: 
When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).

Message: No valid host was found. There are not enough hosts available.
 Code: 500
 File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result

Running SELECT * FROM pci_devices; on the nova database I get the
following

http://imgur.com/a/voGki

As you can see it shows 7 are available.

Expected Results:

Another VM created with 1 more GPU used from the system.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1628168

Title:
  Can't assign system with multiple GPUs to different VMs

Status in OpenStack Compute (nova):
  New

Bug description:
  I have an OS Mitaka deployment that was done by Fuel (9.0).

  I have a system with 8GPUs in a single box. We are trying to allow VMs
  to request access to GPU resources via this box.

  I know that with PCI Passthrough you can only have a device assigned
  to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs
  (8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU ->
  1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.

  I have successfully been able to get the system to have 1 GPU <-> 1
  VM, however when I go to create another VM with a GPU I get "not
  enough hosts found".

  This is what I have done so far.

  /etc/nova/nova.conf

  Add:
   Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]

  sudo gedit /etc/modules and add:
   pci_stub
   vfio
   vfio_iommu_type1
   vfio_pci
   kvm
   kvm_intel

  Sudo vi /etc/default/grub
   GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"

  //BLACKLIST

  sudo gedit /etc/initramfs-tools/modules
   pci_stub ids=10de:17c2
   sudo update-initramfs -u

  On Controller Node:

  Edit nova.conf

  Add specifically for GPU you want to use!

  pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
   Add

  scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
   scheduler_available_filters=nova.scheduler.filters.all_filters
   scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
   scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter

  #: source openrc
   Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"

  Actual Results: 
  When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).

  Message: No valid host was found. There are not enough hosts available.
   Code: 500
   File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result

  Running SELECT * FROM pci_devices; on the nova database I get the
  following

  http://imgur.com/a/voGki

  As you can see it shows 7 are available.

  Expected Results:

  Another VM created with 1 more GPU used from the system.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1628168/+subscriptions