yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91437
[Bug 2008883] Re: VGPU - I can only use one MIG profile in Nova
** Also affects: nova (Ubuntu)
Importance: Undecided
Status: New
** No longer affects: nova (Ubuntu)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2008883
Title:
VGPU - I can only use one MIG profile in Nova
Status in OpenStack Compute (nova):
New
Bug description:
I have one Nvidia A100 card and can only use one MIG profile.
I divided the card into 4 different MIG profiles 2x A100-1-5C, 1x
A100-2-10C, 1x A100-3-20C, As below.
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 2 0 0 | 10MiB / 20096MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 3 0 1 | 6MiB / 9984MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 9 0 2 | 4739MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 10 0 3 | 4739MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
My nova configuration: /etc/nova/nova.conf
[devices]
enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476
[vgpu_nvidia-474]
device_addresses = 0000:61:00.4,0000:61:01.0
[vgpu_nvidia-475]
device_addresses = 0000:61:01.7
[vgpu_nvidia-476]
device_addresses = 0000:61:00.6
# openstack resource provider list
+--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
| uuid | name | generation | root_provider_uuid | parent_provider_uuid |
+--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
| a0269b89-d43d-4042-a64e-3c832f0bb23f | gpu-a01.example.os-tests.com | 104 | a0269b89-d43d-4042-a64e-3c832f0bb23f | None |
| f2e5a4e0-479e-4ee3-b504-36371ded49f5 | gpu-a01.example.os-tests.com_pci_0000_61_01_4 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| a513c661-6dd2-4462-b719-9fbf7b70c409 | gpu-a01.example.os-tests.com_pci_0000_61_01_2 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 9124d3a8-00fb-475e-a0f8-892ccf5d255e | gpu-a01.example.os-tests.com_pci_0000_61_00_7 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 5f443da2-3c75-45c6-9d8a-05ca8a487802 | gpu-a01.example.os-tests.com_pci_0000_61_02_1 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 20da8814-c5f0-4575-a785-579e9abdbb1d | gpu-a01.example.os-tests.com_pci_0000_61_01_3 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 37014baa-fba6-4f14-8be8-17084b3aad36 | gpu-a01.example.os-tests.com_pci_0000_61_01_1 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| a9e2f509-fb03-45a0-9ff0-7c50143c1a9c | gpu-a01.example.os-tests.com_pci_0000_61_02_2 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 202462de-7d01-45c7-b197-1f2ca5c9c7ae | gpu-a01.example.os-tests.com_pci_0000_61_02_3 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0 | gpu-a01.example.os-tests.com_pci_0000_61_01_7 | 43 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 315b5205-26ac-4ec6-b5d2-623cafc18f39 | gpu-a01.example.os-tests.com_pci_0000_61_01_6 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 208f395d-d9c7-4108-8f83-e48cbea0b637 | gpu-a01.example.os-tests.com_pci_0000_61_02_0 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 7d5abf99-3c42-4c62-ba33-15682c6cfc5b | gpu-a01.example.os-tests.com_pci_0000_61_00_4 | 18 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 315b2c35-401c-4165-add8-2b025961b9a0 | gpu-a01.example.os-tests.com_pci_0000_61_01_5 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| c43be42e-e564-46be-9025-4f00a1f7454e | gpu-a01.example.os-tests.com_pci_0000_61_00_5 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 3cd4dbc7-2c2a-448d-a041-27c8fd685950 | gpu-a01.example.os-tests.com_pci_0000_61_01_0 | 14 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 58fbbedb-9845-4397-bd20-f559ba68daee | gpu-a01.example.os-tests.com_pci_0000_61_00_6 | 27 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
+--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
Created Flavor:
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_1
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_1 3cd4dbc7-2c2a-448d-a041-27c8fd685950
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-1 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_1=required
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_2
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_2 7d5abf99-3c42-4c62-ba33-15682c6cfc5b
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-2 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_2=required
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_3
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_3 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-3 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_3=required
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_4
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_4 58fbbedb-9845-4397-bd20-f559ba68daee
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-4 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_4=required
gpu-a01:~/nvidia-dev-ctl# ls /sys/class/mdev_bus/*/mdev_supported_types
'/sys/class/mdev_bus/0000:61:00.4/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:00.5/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:00.6/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:00.7/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.0/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.1/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.2/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.3/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.4/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.5/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.6/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.7/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.0/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.1/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.2/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.3/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
Problem:
===========
I can create only two instances with A100-1-5C(nvidia-474) and types nvidia-475, nvidia-476 are omitted and I can't use them.
If I edit the nova config and replace in /etc/nova/nova.conf
enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476
on
enabled_vgpu_types = nvidia-475,nvidia-476
I will be able to use only one A100-2-10C(nvidia-475) type.
Packages:
============
Version: Ussuri
gpu-a01:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
gpu-a01:~# dpkg -l | grep nova
ii nova-common 2:21.2.4-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node libvirt support
ii python3-nova 2:21.2.4-0ubuntu1 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.0.0-0ubuntu1 all client library for OpenStack Compute API - 3.x
gpu-a01:~# uname -a
Linux compgpu-a01 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2008883/+subscriptions
References