yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #57109
[Bug 1628168] [NEW] Can't assign system with multiple GPUs to different VMs
Public bug reported:
I have an OS Mitaka deployment that was done by Fuel (9.0).
I have a system with 8GPUs in a single box. We are trying to allow VMs
to request access to GPU resources via this box.
I know that with PCI Passthrough you can only have a device assigned to
a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs (8
separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU -> 1VM)
* 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.
I have successfully been able to get the system to have 1 GPU <-> 1 VM,
however when I go to create another VM with a GPU I get "not enough
hosts found".
This is what I have done so far.
/etc/nova/nova.conf
Add:
Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]
sudo gedit /etc/modules and add:
pci_stub
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_intel
Sudo vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"
//BLACKLIST
sudo gedit /etc/initramfs-tools/modules
pci_stub ids=10de:17c2
sudo update-initramfs -u
On Controller Node:
Edit nova.conf
Add specifically for GPU you want to use!
pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
Add
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter
#: source openrc
Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"
Actual Results:
When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).
Message: No valid host was found. There are not enough hosts available.
Code: 500
File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result
Running SELECT * FROM pci_devices; on the nova database I get the
following
http://imgur.com/a/voGki
As you can see it shows 7 are available.
Expected Results:
Another VM created with 1 more GPU used from the system.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1628168
Title:
Can't assign system with multiple GPUs to different VMs
Status in OpenStack Compute (nova):
New
Bug description:
I have an OS Mitaka deployment that was done by Fuel (9.0).
I have a system with 8GPUs in a single box. We are trying to allow VMs
to request access to GPU resources via this box.
I know that with PCI Passthrough you can only have a device assigned
to a single VM (e.g. 1 device <-> 1 VM). However, this box has 8 GPUs
(8 separate devices). So I want support (1GPU -> 1VM) * 8, or (2GPU ->
1VM) * 4, (4GPU -> 1VM) * 2, or (8GPU -> 1VM) * 1.
I have successfully been able to get the system to have 1 GPU <-> 1
VM, however when I go to create another VM with a GPU I get "not
enough hosts found".
This is what I have done so far.
/etc/nova/nova.conf
Add:
Pic_passthrough_whitelist = [{"vendor_id": "10de", "product_id": "17c2"}]
sudo gedit /etc/modules and add:
pci_stub
vfio
vfio_iommu_type1
vfio_pci
kvm
kvm_intel
Sudo vi /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1"
//BLACKLIST
sudo gedit /etc/initramfs-tools/modules
pci_stub ids=10de:17c2
sudo update-initramfs -u
On Controller Node:
Edit nova.conf
Add specifically for GPU you want to use!
pci_alias={"vendor_id":"10de", "product_id":"17c2", "name":"titanx"}
Add
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_available_filters=nova.scheduler.filters.pci_passthrough_filter.PciPassthroughFilter
scheduler_default_filters=RamFilter,ComputeFilter,AvailabilityZoneFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,PciPassthroughFilter
#: source openrc
Nova flavor-key g1.xlarge set "pci_passthrough:alias"="titanx:1"
Actual Results:
When I go to create my second VM with the same flavor it errors out with this message. (If I create 1 VM it works and a GPU is assigned to that machine).
Message: No valid host was found. There are not enough hosts available.
Code: 500
File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/dist-packages/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 91, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 512, in send retry=retry) File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _send raise result
Running SELECT * FROM pci_devices; on the nova database I get the
following
http://imgur.com/a/voGki
As you can see it shows 7 are available.
Expected Results:
Another VM created with 1 more GPU used from the system.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1628168/+subscriptions