← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1958458] [NEW] Multiple GPU card bind to multiple vms

 

Public bug reported:

I am running wallaby and I have compute node which has two GPU card. My
requirement is to create vm1 which bind with GPU-1 and vm2 bind with
GPU-2 card but i am getting error

[root@GPUN06 /]# lspci -nn | grep -i nv
5e:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
d8:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)

[root@GPUN06 /]# cat /etc/modprobe.d/gpu-vfio.conf
options vfio-pci ids=10de:1df6

[root@GPUN06 /]# cat /etc/modules-load.d/vfio-pci.conf
vfio-pci


Nova Api

[PCI]
alias: { "vendor_id":"10de", "product_id":"1df6", "device_type":"type-PCI", "name":"tesla-v100" }


# Flavor 
openstack flavor create --vcpus 4 --ram 8192 --disk 40 --property "pci_passthrough:alias"="tesla-v100:1" --property gpu-node=true g1.small


I am successfully able to spin up first GPU vm which bind with single GPU card but when i create second VM i get following error in libvirt 

error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error:
Hostdev already exists in the domain configuration

Look like libvirt or nova doesn't understand it has second GPU card
available.


# if i set "pci_passthrough:alias"="tesla-v100:2" in flavor then i can able to bind both GPU card to single VM. 

libvirt version: 7.6.0
Openstack version: Wallaby
Distro: CentOS 8 stream

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1958458

Title:
  Multiple GPU card bind to multiple vms

Status in OpenStack Compute (nova):
  New

Bug description:
  I am running wallaby and I have compute node which has two GPU card.
  My requirement is to create vm1 which bind with GPU-1 and vm2 bind
  with GPU-2 card but i am getting error

  [root@GPUN06 /]# lspci -nn | grep -i nv
  5e:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)
  d8:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] [10de:1df6] (rev a1)

  [root@GPUN06 /]# cat /etc/modprobe.d/gpu-vfio.conf
  options vfio-pci ids=10de:1df6

  [root@GPUN06 /]# cat /etc/modules-load.d/vfio-pci.conf
  vfio-pci

  
  Nova Api

  [PCI]
  alias: { "vendor_id":"10de", "product_id":"1df6", "device_type":"type-PCI", "name":"tesla-v100" }

  
  # Flavor 
  openstack flavor create --vcpus 4 --ram 8192 --disk 40 --property "pci_passthrough:alias"="tesla-v100:1" --property gpu-node=true g1.small

  
  I am successfully able to spin up first GPU vm which bind with single GPU card but when i create second VM i get following error in libvirt 

  error : virDomainDefDuplicateHostdevInfoValidate:1082 : XML error:
  Hostdev already exists in the domain configuration

  Look like libvirt or nova doesn't understand it has second GPU card
  available.

  
  # if i set "pci_passthrough:alias"="tesla-v100:2" in flavor then i can able to bind both GPU card to single VM. 

  libvirt version: 7.6.0
  Openstack version: Wallaby
  Distro: CentOS 8 stream

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1958458/+subscriptions



Follow ups