yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91438
[Bug 2008883] Re: VGPU - I can only use one MIG profile in Nova
Nova does not impos any limits on the mig profiles or mdev types that
can be used.
this is a hardware limitation of the nvida gpus not a nova limitation.
you are using A100 which does support using more then one type but only
on specific types which they have documented in there product docs.
if you find that you cannot use the selced type then this is likely a
result of using a unsuproted combination.
if they have documented that this should work you should file a bug with
nvidia support.
you can find there documentiaon here
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#a100-profiles
i would suggest starting with one of the example toploies before trying
your own to ensure it works.
https://docs.nvidia.com/datacenter/tesla/mig-user-
guide/index.html#create-gi
the nova community only provides enablement of the mdev based vgpu feature
we do not provide support for configuring the nvidia hardware and software.
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2008883
Title:
VGPU - I can only use one MIG profile in Nova
Status in OpenStack Compute (nova):
Invalid
Bug description:
I have one Nvidia A100 card and can only use one MIG profile.
I divided the card into 4 different MIG profiles 2x A100-1-5C, 1x
A100-2-10C, 1x A100-3-20C, As below.
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 2 0 0 | 10MiB / 20096MiB | 42 0 | 3 0 2 0 0 |
| | 0MiB / 32767MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 3 0 1 | 6MiB / 9984MiB | 28 0 | 2 0 1 0 0 |
| | 0MiB / 16383MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 9 0 2 | 4739MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 10 0 3 | 4739MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
My nova configuration: /etc/nova/nova.conf
[devices]
enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476
[vgpu_nvidia-474]
device_addresses = 0000:61:00.4,0000:61:01.0
[vgpu_nvidia-475]
device_addresses = 0000:61:01.7
[vgpu_nvidia-476]
device_addresses = 0000:61:00.6
# openstack resource provider list
+--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
| uuid | name | generation | root_provider_uuid | parent_provider_uuid |
+--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
| a0269b89-d43d-4042-a64e-3c832f0bb23f | gpu-a01.example.os-tests.com | 104 | a0269b89-d43d-4042-a64e-3c832f0bb23f | None |
| f2e5a4e0-479e-4ee3-b504-36371ded49f5 | gpu-a01.example.os-tests.com_pci_0000_61_01_4 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| a513c661-6dd2-4462-b719-9fbf7b70c409 | gpu-a01.example.os-tests.com_pci_0000_61_01_2 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 9124d3a8-00fb-475e-a0f8-892ccf5d255e | gpu-a01.example.os-tests.com_pci_0000_61_00_7 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 5f443da2-3c75-45c6-9d8a-05ca8a487802 | gpu-a01.example.os-tests.com_pci_0000_61_02_1 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 20da8814-c5f0-4575-a785-579e9abdbb1d | gpu-a01.example.os-tests.com_pci_0000_61_01_3 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 37014baa-fba6-4f14-8be8-17084b3aad36 | gpu-a01.example.os-tests.com_pci_0000_61_01_1 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| a9e2f509-fb03-45a0-9ff0-7c50143c1a9c | gpu-a01.example.os-tests.com_pci_0000_61_02_2 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 202462de-7d01-45c7-b197-1f2ca5c9c7ae | gpu-a01.example.os-tests.com_pci_0000_61_02_3 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0 | gpu-a01.example.os-tests.com_pci_0000_61_01_7 | 43 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 315b5205-26ac-4ec6-b5d2-623cafc18f39 | gpu-a01.example.os-tests.com_pci_0000_61_01_6 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 208f395d-d9c7-4108-8f83-e48cbea0b637 | gpu-a01.example.os-tests.com_pci_0000_61_02_0 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 7d5abf99-3c42-4c62-ba33-15682c6cfc5b | gpu-a01.example.os-tests.com_pci_0000_61_00_4 | 18 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 315b2c35-401c-4165-add8-2b025961b9a0 | gpu-a01.example.os-tests.com_pci_0000_61_01_5 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| c43be42e-e564-46be-9025-4f00a1f7454e | gpu-a01.example.os-tests.com_pci_0000_61_00_5 | 1 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 3cd4dbc7-2c2a-448d-a041-27c8fd685950 | gpu-a01.example.os-tests.com_pci_0000_61_01_0 | 14 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
| 58fbbedb-9845-4397-bd20-f559ba68daee | gpu-a01.example.os-tests.com_pci_0000_61_00_6 | 27 | a0269b89-d43d-4042-a64e-3c832f0bb23f | a0269b89-d43d-4042-a64e-3c832f0bb23f |
+--------------------------------------+----------------------------------------------+------------+--------------------------------------+--------------------------------------+
Created Flavor:
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_1
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_1 3cd4dbc7-2c2a-448d-a041-27c8fd685950
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-1 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_1=required
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_2
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_2 7d5abf99-3c42-4c62-ba33-15682c6cfc5b
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-2 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_2=required
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_3
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_3 5e26d9e8-b59a-47b3-879c-c2c50ab7f1f0
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-3 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_3=required
openstack --os-placement-api-version 1.6 trait create CUSTOM_N_4
openstack --os-placement-api-version 1.6 resource provider trait set --trait CUSTOM_N_4 58fbbedb-9845-4397-bd20-f559ba68daee
openstack flavor create --private --description "vgpu-test" --ram $((8*1024)) --disk 0 --vcpus 8 vgpu-4 --project vgpu --property resources:VGPU=1 --property trait:CUSTOM_N_4=required
gpu-a01:~/nvidia-dev-ctl# ls /sys/class/mdev_bus/*/mdev_supported_types
'/sys/class/mdev_bus/0000:61:00.4/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:00.5/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:00.6/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:00.7/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.0/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.1/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.2/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.3/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.4/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.5/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.6/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:01.7/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.0/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.1/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.2/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
'/sys/class/mdev_bus/0000:61:02.3/mdev_supported_types':
nvidia-468 nvidia-469 nvidia-470 nvidia-471 nvidia-472 nvidia-473 nvidia-474 nvidia-475 nvidia-476 nvidia-477 nvidia-478 nvidia-706
Problem:
===========
I can create only two instances with A100-1-5C(nvidia-474) and types nvidia-475, nvidia-476 are omitted and I can't use them.
If I edit the nova config and replace in /etc/nova/nova.conf
enabled_vgpu_types = nvidia-474,nvidia-475,nvidia-476
on
enabled_vgpu_types = nvidia-475,nvidia-476
I will be able to use only one A100-2-10C(nvidia-475) type.
Packages:
============
Version: Ussuri
gpu-a01:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
gpu-a01:~# dpkg -l | grep nova
ii nova-common 2:21.2.4-0ubuntu1 all OpenStack Compute - common files
ii nova-compute 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:21.2.4-0ubuntu1 all OpenStack Compute - compute node libvirt support
ii python3-nova 2:21.2.4-0ubuntu1 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:17.0.0-0ubuntu1 all client library for OpenStack Compute API - 3.x
gpu-a01:~# uname -a
Linux compgpu-a01 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2008883/+subscriptions
References