openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #16123
Re: What is the most commonly used Hypervisor and toolset combination?
Boris-Michel Deschenes wrote:
> John,
>
> Sorry for my late response..
>
> It would be great to collaborate, like I said, I prefer to keep the libvirt layer as it works great with openstack and many other techs (collectd, virt-manager, etc.), the virsh tool is also very useful for us.
>
> You say:
> -----------
> We have GPU passthrough working with NVIDIA GPUs in Xen 4.1.2, if I recall correctly. We don't yet have a stable Xen + Libvirt installation working, but we're looking at it. Perhaps it would be worth collaborating since it sounds like this could be a win for both of us.
> -----------
> I have Jim Fehlig in CC since this could be of interest to him.
>
> We managed to have the GPU passthrough of NVIDIA cards using Xen 4.1.2 but ONLY with the xenapi (actually the whole XCP toolstack), with libvirt/Xen 4.1.2 and even libvirt/Xen 4.1.3, I only manage to apss through radeon GPUs, the reason could be:
>
> 1. The inability to pass the gfx_passthru parameter through libvirt (IIRC this parameter passes the PCI device as the main VGA card and not a second one).
> 2. Bad FLR reset support (or other PCI low-level function) from the NVIDIA boards
>
I've noticed this issue with some Broadcom multifunction nics. No FLR,
so fallback to secondary bus reset, which is problematic if another
function is being used by a different vm.
> 3. something else entirely.
>
> Anyway, like I said, this GPU passthrough of nvidia worked well with XCP using xenapi but not with libvirt/Xen
>
Hmm, would be nice to get that fixed. To date, I haven't tried GPU
passthrough with Xen so I'm not familiar with the issues.
> Now, as for the libvirt/Xen setup we have, I don't know if I would call it stable but it does the job as a POC cloud and is actually used by real people with real GPU needs (for example developing on OpenCL 1.2), the main thing is that it seamlessly integrates with openstack (because of libvirt) and with the instance_type_extra_specs, you can actually add a couple of these "special" nodes to an existing plain KVM cloud and they will receive the instances requesting GPUs without any problem.
>
> the setup:
> (this only refers to compute nodes as controller nodes are un-modified)
>
> 1. Install Centos 6.2 and make your own project Zeus (transforming a centos in Xen) http://www.howtoforge.com/virtualization-with-xen-on-centos-6.2-x86_64-paravirtualization-and-hardware-virtualization (first page only and skip the bridge setup as openstack-nova-compute does this at startup). You end up with a Xen hypervisor with libvirt, the libvirt patch is actually a single-line config change IIRC. Pretty straight-forward.
>
> 2. Install openstack-nova from EPEL (so all this refers only to ESSEX, openstack 2012.1)
>
> 3. configure the compute node accordingly (libvirt_type=xen)
>
> That's the first part, at this point, you can spawn a VM, and attach a GPU manually with:
>
> virsh nodedev-dettach pci_0000_02_00_01
> (edit the VM's nova libvirt.xml to add a pci node dev definition like this: http://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/chap-Virtualization-PCI_passthrough.html )
> virsh define libvirt.xml
> virsh start instance-0000000x
>
> Now, this is all manual and we wish to automate this in openstack, so this is what I've done, I currently can launch VMs in my cloud and the passthrough occurs without any intervention.
>
> These files were modified from an original essex installation to make this possible:
>
> (on the controller)
> create a g1.small instance_type with {'free_gpus': '1'} as instance_type_extra_specs
> select the compute_filter filter to enforce extra_specs in scheduling (also the function host_passes of the filter is slightly modified so that it read key>=value instead of key=value... (free_gpus>=1 is good, does not need to be strictly equals to 1)
>
I think this has already been done for you in Folsom via the
ComputeCapabilitiesFilter and Jinwoo Suh's addition of
instance_type_extra_specs operators. See commit 90f77d71.
> (on the compute node)
> nova/virt/libvirt/gpu.py
> a new file that contains functions like detach_all_gpus, get_free_gpus, simple stuff
Have you considered pushing this upstream?
> using virsh and lspci
> nova/virt/libvirt/connection.py
> calls gpu.detach_all_gpus on startup (virsh nodedev-dettach)
> builds the VM libvirt.xml as normal but also adds the pci nodedev definition
> advertises free_gpus capabilities so that the scheduler gets it through host_state calls
>
> that's about it, with that we get:
>
> 1. compute nodes that detach all GPUS on startup
> 2. compute nodes that advertise the nb of free gpus to the scheduler
> 3. compute nodes that are able to build the VMs libvirt.xml with a valid, free GPU definition when a VM is launched
> 4. controller that runs a scheduler that knows where to send VMs (free_gpus >= 1)
>
> It does the trick for now, with RADEON 6950 I get 100% success, I spawn a VM and in 20 seconds I get a windows 7 with a real GPU available through RDC.
>
> I'll try and get what the problem is regarding NVIDIA passthrough, If I do I'll be sure to inform Jim Fehlig so that we can work this into libvirt.
>
Yes, please do.
> All this is in openstack essex (2012.1) so I will probably never send the code upstream as most of this has changed if folsom (for example the extra_specs already is different in folsom) but if you want to have a look, let me know.
>
As mentioned above, I think that one has already been done for you.
Seems you just need to work on getting your nova/virt/libvirt/gpu.py
addition upstream.
Regards,
Jim
References