← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2008883] Re: VGPU - I can only use one MIG profile in Nova

 

** Also affects: nova (Ubuntu)
   Importance: Undecided
       Status: New

** No longer affects: nova (Ubuntu)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2008883

Title:
  VGPU - I can only use one MIG profile in Nova

Status in OpenStack Compute (nova):
  New

Bug description:
  I have one Nvidia A100 card and can only use one MIG profile.

  I divided the card into 4 different MIG profiles 2x A100-1-5C, 1x
  A100-2-10C, 1x A100-3-20C, As below.

  +-----------------------------------------------------------------------------+
  | MIG devices:                                                                |
  +------------------+----------------------+-----------+-----------------------+
  | GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
  |      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
  |                  |                      |        ECC|                       |
  |==================+======================+===========+=======================|
  |  0    2   0   0  |     10MiB / 20096MiB | 42      0 |  3   0    2    0    0 |
  |                  |      0MiB / 32767MiB |           |                       |
  +------------------+----------------------+-----------+-----------------------+
  |  0    3   0   1  |      6MiB /  9984MiB | 28      0 |  2   0    1    0    0 |
  |                  |      0MiB / 16383MiB |           |                       |
  +------------------+----------------------+-----------+-----------------------+
  |  0    9   0   2  |   4739MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
  |                  |      0MiB /  8191MiB |           |                       |
  +------------------+----------------------+-----------+-----------------------+
  |  0   10   0   3  |   4739MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
  |                  |      0MiB /  8191MiB |           |                       |
  +------------------+----------------------+-----------+-----------------------+

  
  My nova configuration:  /etc/nova/nova.conf

  [devices]
  enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476
  [vgpu_nvidia-474]
  device_addresses = 0000:61:00.4,0000:61:01.0
  [vgpu_nvidia-475]
  device_addresses = 0000:61:01.7
  [vgpu_nvidia-476]
  device_addresses = 0000:61:00.6

  # openstack resource provider list
  +--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
  | uuid                                 | name                                         | generation | root_provider_uuid                   | parent_provider_uuid                 |
  +--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
  | a0269b89-d43d-4042-a64e-3c832f0bb23f | gpu-a01.example.os-tests.com                  |        104 | a0269b89-d43d-4042-a64e-3c832f0bb23f | None                                 |
  | f2e5a4e0-479e-4ee3-b504-36371ded49f5 | gpu-a01.example.os-tests.com_pci_0000_61_01_4 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | a513c661-6dd2-4462-b719-9fbf7b70c409 | gpu-a01.example.os-tests.com_pci_0000_61_01_2 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 9124d3a8-00fb-475e-a0f8-892ccf5d255e | gpu-a01.example.os-tests.com_pci_0000_61_00_7 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 5f443da2-3c75-45c6-9d8a-05ca8a487802 | gpu-a01.example.os-tests.com_pci_0000_61_02_1 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 20da8814-c5f0-4575-a785-579e9abdbb1d | gpu-a01.example.os-tests.com_pci_0000_61_01_3 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 37014baa-fba6-4f14-8be8-17084b3aad36 | gpu-a01.example.os-tests.com_pci_0000_61_01_1 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | a9e2f509-fb03-45a0-9ff0-7c50143c1a9c | gpu-a01.example.os-tests.com_pci_0000_61_02_2 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 202462de-7d01-45c7-b197-1f2ca5c9c7ae | gpu-a01.example.os-tests.com_pci_0000_61_02_3 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0 | gpu-a01.example.os-tests.com_pci_0000_61_01_7 |         43 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 315b5205-26ac-4ec6-b5d2-623cafc18f39 | gpu-a01.example.os-tests.com_pci_0000_61_01_6 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 208f395d-d9c7-4108-8f83-e48cbea0b637 | gpu-a01.example.os-tests.com_pci_0000_61_02_0 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 7d5abf99-3c42-4c62-ba33-15682c6cfc5b | gpu-a01.example.os-tests.com_pci_0000_61_00_4 |         18 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 315b2c35-401c-4165-add8-2b025961b9a0 | gpu-a01.example.os-tests.com_pci_0000_61_01_5 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | c43be42e-e564-46be-9025-4f00a1f7454e | gpu-a01.example.os-tests.com_pci_0000_61_00_5 |          1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 3cd4dbc7-2c2a-448d-a041-27c8fd685950 | gpu-a01.example.os-tests.com_pci_0000_61_01_0 |         14 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  | 58fbbedb-9845-4397-bd20-f559ba68daee | gpu-a01.example.os-tests.com_pci_0000_61_00_6 |         27 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
  +--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+

  Created Flavor:

  openstack --os-placement-api-version 1.6 trait create CUSTOM_N_1
  openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_1 3cd4dbc7-2c2a-448d-a041-27c8fd685950
  openstack flavor create --private  --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-1  --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_1=required

  openstack --os-placement-api-version 1.6 trait create CUSTOM_N_2
  openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_2 7d5abf99-3c42-4c62-ba33-15682c6cfc5b
  openstack flavor create --private  --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-2  --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_2=required

  openstack --os-placement-api-version 1.6 trait create CUSTOM_N_3
  openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_3 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0
  openstack flavor create --private  --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-3  --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_3=required

  openstack --os-placement-api-version 1.6 trait create CUSTOM_N_4
  openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_4 58fbbedb-9845-4397-bd20-f559ba68daee
  openstack flavor create --private  --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-4  --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_4=required


  gpu-a01:~/nvidia-dev-ctl# ls /sys/class/mdev_bus/*/mdev_supported_types
  '/sys/class/mdev_bus/0000:61:00.4/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:00.5/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:00.6/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:00.7/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.0/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.1/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.2/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.3/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.4/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.5/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.6/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:01.7/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:02.0/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:02.1/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:02.2/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706

  '/sys/class/mdev_bus/0000:61:02.3/mdev_supported_types':
  nvidia-468  nvidia-469  nvidia-470  nvidia-471  nvidia-472  nvidia-473  nvidia-474  nvidia-475  nvidia-476  nvidia-477  nvidia-478  nvidia-706


  
  Problem:
  ===========
  I can create only two instances with A100-1-5C(nvidia-474) and types nvidia-475, nvidia-476 are omitted and I can't use them. 

  If I edit the nova config and replace in /etc/nova/nova.conf
  enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476
  on
  enabled_vgpu_types = nvidia-475,nvidia-476
  I will be able to use only one A100-2-10C(nvidia-475) type.


  Packages:
  ============

  Version: Ussuri

  gpu-a01:~# lsb_release -a
  No LSB modules are available.
  Distributor ID: Ubuntu
  Description:    Ubuntu 20.04.4 LTS
  Release:        20.04
  Codename:       focal

  gpu-a01:~# dpkg -l | grep nova
  ii  nova-common                           2:21.2.4-0ubuntu1                                    all          OpenStack Compute - common files
  ii  nova-compute                          2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node base
  ii  nova-compute-kvm                      2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node (KVM)
  ii  nova-compute-libvirt                  2:21.2.4-0ubuntu1                                    all          OpenStack Compute - compute node libvirt support
  ii  python3-nova                          2:21.2.4-0ubuntu1                                    all          OpenStack Compute Python 3 libraries
  ii  python3-novaclient                    2:17.0.0-0ubuntu1                                    all          client library for OpenStack Compute API - 3.x

  gpu-a01:~# uname -a
  Linux compgpu-a01 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2008883/+subscriptions



References