← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2114947] Re: Nova/Placement ignores the flavor’s resource_class + trait constraints when scheduling SR-IOV vGPU devices.

 

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2114947

Title:
  Nova/Placement ignores the flavor’s resource_class + trait constraints
  when scheduling SR-IOV vGPU devices.

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Nova/Placement ignores the flavor’s resource_class + trait constraints
  when scheduling SR-IOV vGPU devices.

  Environment

  Deployment : OpenStack Epoxy 2025.1 (Kolla-Ansible)
  Hypervisor node : Ubuntu 24.04, NVIDIA vGPU driver 570.148
  Hardware : 10xRTX 6000 Ada cards in SR-IOV mode
  PCI config :
  [pci]
  report_in_placement = true
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4F:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:52:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:53:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:56:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:57:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:ce:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d1:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d2:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.6", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.7", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:01.0", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:01.1", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d6:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d6:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "managed":"no" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "device_type":"type-VF", "name":"rtx6000-ada-48q" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "device_type":"type-VF", "name":"rtx6000-ada-8q" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "device_type":"type-VF", "name":"rtx6000-ada-24q" }

  openstack flavor create 8xRTX-ADA-48Q --private \
    --ram 4096 --vcpu 4 --disk 0 \
    --property "resources:CUSTOM_NVIDIA_RTX6000_ADA_48Q"=1 \
    --property "trait:CUSTOM_NVIDIA_RTX6000_ADA_48Q"="required" \
    --property "pci_passthrough:alias"="rtx6000-ada-48q:8"
  openstack flavor set --project admin 8xRTX-ADA-48Q

  openstack flavor create 2xRTX-ADA-24Q --private \
    --ram 4096 --vcpu 4 --disk 0 \
    --property "resources:CUSTOM_NVIDIA_RTX6000_ADA_24Q"=1 \
    --property "trait:CUSTOM_NVIDIA_RTX6000_ADA_24Q"="required" \
    --property "pci_passthrough:alias"="rtx6000-ada-24q:2"
  openstack flavor set --project admin 2xRTX-ADA-24Q

  openstack flavor create 6xRTX-ADA-8Q --private \
    --ram 4096 --vcpu 4 --disk 0 \
    --property "resources:CUSTOM_NVIDIA_RTX6000_ADA_8Q"=1 \
    --property "trait:CUSTOM_NVIDIA_RTX6000_ADA_8Q"="required" \
    --property "pci_passthrough:alias"="rtx6000-ada-8q:6"
  openstack flavor set --project admin 6xRTX-ADA-8Q

  Each PF is enabled for VFs and the current_vgpu_type of all VFs are
  set respectively to the profiles you see above each boot.

  Steps to reproduce

      Create instances that utilize one of each flavor, starting from
  8x48G

      Verify host inventory shows 2 free 24Q VFs and 6 free 8Q VFs.

      Boot an instance with 2 24Q VFs

      It seems that instances come up with 8 GB VRAM instead of 24 GB.

      This is not unique to these specific vgpu profiles, in general,
  nova will mismatch resources typically substituting lower Q in place
  of higher ones. In some cases it will correctly provide the right
  resource. For example, if I were to create an instance utilizing all
  the 8Q VFs first (and it does so correctly which it seems to do
  consistently) then openstack will proceed to also correctly assign the
  2x 24Q VFs seemingly because its the last resource left to assign.

       In my case, repeatably recreating the instance, it repeatably
  spawns with the incorrect VFs attached. I remove the instance
  containing the 2xRTX-ADA-24Q flavor (and mismatching 2x8Q resources),
  perform a clean reboot (nvidia-vgpu-vfio driver complains it fails to
  post VM shutdown event on all mounted VFs). At boot, I confirm my
  8x48Q instance is correct, and then It once again incorrectly spawns
  my 2xRTX-ADA-24Q with 2 8Q VFs.

  Expected result

  Scheduler should allocate only resource providers offering
  CUSTOM_NVIDIA_RTX6000_ADA_24Q and the matching trait; guest should
  always see a 24 GB framebuffer. If there aren't enough available, it should error out and not substitute it for another VF, ever.

  Actual result

  Placement allocation occasionally contains a provider with
  CUSTOM_NVIDIA_RTX6000_ADA_8Q. Instance builds successfully; inside
  the guest nvidia-smi reports 2x 0 MiB / 8 192 MiB VRAM.
  Impact

  Workloads requiring >8 GB fail or OOM. Operators must manually rebuild
  affected VMs, defeating automated scheduling.
  Evidence (example of a failed VM)

  Conductor log:

  2025-06-18 18:20:06.162 1102 INFO nova.compute.rpcapi [None
  req-4554da9d-0c42-4dee-b35e-660b2a4ebd64 - - - - - -] Automatically
  selected compute RPC version 6.4 from minimum service version 68

  Compute log :

  2025-06-18 18:20:07.365 7 INFO nova.compute.claims [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Claim successful on node LBRN-HV
  2025-06-18 18:20:10.983 7 INFO nova.compute.pci_placement_translator [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] Placement PCI resource view: Placement PCI view on LBRN-HV: RP(LBRN-HV_0000:4F:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:52:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:53:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:56:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:57:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:CE:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:D1:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:D2:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:D5:00.0, CUSTOM_NVIDIA_RTX6000_ADA_8Q=6, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_8Q), RP(LBRN-HV_0000:D6:00.0, CUSTOM_NVIDIA_RTX6000_ADA_24Q=2, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_24Q)
  2025-06-18 18:20:10.985 7 INFO nova.scheduler.client.report [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] Performing resource provider inventory and allocation data migration.
  2025-06-18 18:20:14.084 7 INFO nova.virt.libvirt.driver [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Ignoring supplied device name: /dev/vda. Libvirt can't honour user-supplied dev names
  2025-06-18 18:20:15.109 7 INFO nova.virt.block_device [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Booting with volume snapshot 43de2624-e662-4c04-9d34-b783c28765a9 at /dev/vda
  2025-06-18 18:20:19.056 7 INFO os_brick.initiator.connectors.lightos [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] Current host hostNQN nqn.2014-08.org.nvmexpress:uuid:0229145d-80ab-5a47-9954-89ba44d6e654 and IP(s) are ['172.31.21.2', '172.31.21.250', 'fe80::826a:e924:f880:a9b6', '172.31.1.11', '172.31.1.250', 'fe80::9597:33d0:bb62:d15', '192.168.122.1', 'fe80::8ba:dc94:6214:cfc9', 'fe80::b532:519:7aca:62b1', 'fe80::b451:bdff:feee:f5e1', 'fe80::1015:f3ff:fe84:45ea', 'fe80::64b5:c8ff:fe76:798b', 'fe80::d051:ceff:fe48:fc12', 'fe80::fc16:3eff:fe0d:2e39']
  2025-06-18 18:20:21.749 7 INFO nova.virt.libvirt.driver [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Creating image(s)
  2025-06-18 18:20:21.813 7 INFO os_brick.initiator.connectors.iscsi [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] Trying to connect to iSCSI portal 172.31.21.2:3260
  2025-06-18 18:20:23.643 7 INFO os_vif [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:a6:79:2c,bridge_name='qbr920efb74-78',has_traffic_filtering=True,id=920efb74-783a-4484-924e-2e6d7c560781,network=Network(7a3552f8-c90c-4fcd-a32f-9e3bee272b89),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tap920efb74-78')
  2025-06-18 18:20:28.858 7 INFO nova.compute.pci_placement_translator [None req-85589a64-2bfe-44b3-b93a-fd595e6b7e7d - - - - - -] Placement PCI resource view: Placement PCI view on LBRN-HV: RP(LBRN-HV_0000:4F:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:52:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:53:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:56:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:57:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:CE:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:D1:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:D2:00.0, CUSTOM_NVIDIA_RTX6000_ADA_48Q=1, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_48Q), RP(LBRN-HV_0000:D5:00.0, CUSTOM_NVIDIA_RTX6000_ADA_8Q=6, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_8Q), RP(LBRN-HV_0000:D6:00.0, CUSTOM_NVIDIA_RTX6000_ADA_24Q=2, traits=COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_24Q)
  2025-06-18 18:20:29.443 7 INFO nova.compute.manager [None req-776dc67f-a3f0-473a-a8b3-3bf4b0f859aa - - - - - -] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] VM Started (Lifecycle Event)
  2025-06-18 18:20:29.457 7 INFO nova.virt.libvirt.driver [-] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Instance spawned successfully.
  2025-06-18 18:20:29.458 7 INFO nova.compute.manager [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Took 7.71 seconds to spawn the instance on the hypervisor.
  2025-06-18 18:20:29.995 7 INFO nova.compute.manager [None req-8d7975d5-780b-459b-ac4e-8ef5810d8bbe 25c2808d552741ce849e4fd9b320065b 073808578a7e4e8aa0ceebd1a69b34a6 - - default default] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] Took 22.78 seconds to build instance.
  2025-06-18 18:20:30.466 7 INFO nova.compute.manager [None req-776dc67f-a3f0-473a-a8b3-3bf4b0f859aa - - - - - -] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] VM Paused (Lifecycle Event)
  2025-06-18 18:20:30.980 7 INFO nova.compute.manager [None req-776dc67f-a3f0-473a-a8b3-3bf4b0f859aa - - - - - -] [instance: 67f35cb5-44b5-409c-b3df-8d30397ff232] VM Resumed (Lifecycle Event)

  Resource provider troubleshooting:

  openstack server list
  +--------------------------------------+------------------+-------------------+---------------------+--------------------------+---------------+
  | ID                                   | Name             | Status            | Networks            | Image                    | Flavor        |
  +--------------------------------------+------------------+-------------------+---------------------+--------------------------+---------------+
  | 1ea4855c-06c2-44d8-99e2-a2b4bc463e1b | 2xRTX-ADA-24Q    | ACTIVE            | inter=192.168.0.117 | N/A (booted from volume) | 2xRTX-ADA-24Q |
  | fe09bf4a-cd87-4916-9b8d-18ceaed50c92 | 8xRTX-ADA-48Q    | ACTIVE            | inter=192.168.0.159 | N/A (booted from volume) | 8xRTX-ADA-48Q |
  | c96e76a9-b122-45e3-b7fa-111be3d90922 | Win11-vm         | SHUTOFF           | inter=192.168.0.142 | N/A (booted from volume) | m1.medium     |
  +--------------------------------------+------------------+-------------------+---------------------+--------------------------+---------------+

  openstack resource provider allocation show 1ea4855c-06c2-44d8-99e2-a2b4bc463e1b
  +--------------------------------------+------------+--------------------------------------+----------------------------------+----------------------------------+
  | resource_provider                    | generation | resources                            | project_id                       | user_id                          |
  +--------------------------------------+------------+--------------------------------------+----------------------------------+----------------------------------+
  | 8b21748f-d43e-48b7-b3ca-46d565c819ce |        213 | {'VCPU': 4, 'MEMORY_MB': 4096}       | 073808578a7e4e8aa0ceebd1a69b34a6 | 25c2808d552741ce849e4fd9b320065b |
  | 19a368c6-52fe-4bf6-a6e6-19bd45f1538c |         35 | {'CUSTOM_NVIDIA_RTX6000_ADA_24Q': 1} | 073808578a7e4e8aa0ceebd1a69b34a6 | 25c2808d552741ce849e4fd9b320065b |
  | b73d2e49-bf73-40ab-b022-5e3994993090 |        114 | {'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 2}  | 073808578a7e4e8aa0ceebd1a69b34a6 | 25c2808d552741ce849e4fd9b320065b |
  +--------------------------------------+------------+--------------------------------------+----------------------------------+----------------------------------+

  cat /sys/bus/pci/devices/0000\:d6\:00.4/nvidia/current_vgpu_type
  949
  cat /sys/bus/pci/devices/0000\:d6\:00.5/nvidia/current_vgpu_type
  949
  nvidia-smi vgpu
  Wed Jun 18 18:14:55 2025
  +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 570.148.06             Driver Version: 570.148.06                |
  |---------------------------------+------------------------------+------------+
  | GPU  Name                       | Bus-Id                       | GPU-Util   |
  |      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
  |=================================+==============================+============|
  |   0  NVIDIA RTX 6000 Ada Ge...  | 00000000:4F:00.0             |   0%       |
  |      3251634352  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   1  NVIDIA RTX 6000 Ada Ge...  | 00000000:52:00.0             |   0%       |
  |      3251634357  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   2  NVIDIA RTX 6000 Ada Ge...  | 00000000:53:00.0             |   0%       |
  |      3251634387  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   3  NVIDIA RTX 6000 Ada Ge...  | 00000000:56:00.0             |   0%       |
  |      3251634362  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   4  NVIDIA RTX 6000 Ada Ge...  | 00000000:57:00.0             |   0%       |
  |      3251634367  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   5  NVIDIA RTX 6000 Ada Ge...  | 00000000:CE:00.0             |   0%       |
  |      3251634372  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   6  NVIDIA RTX 6000 Ada Ge...  | 00000000:D1:00.0             |   0%       |
  |      3251634377  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   7  NVIDIA RTX 6000 Ada Ge...  | 00000000:D2:00.0             |   0%       |
  |      3251634382  NVIDIA RTX6... | fe09...  instance-0000003c   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   8  NVIDIA RTX 6000 Ada Ge...  | 00000000:D5:00.0             |   0%       |
  |      3251634392  NVIDIA RTX6... | 1ea4...  instance-00000043   |      0%    |
  |      3251634398  NVIDIA RTX6... | 1ea4...  instance-00000043   |      0%    |
  +---------------------------------+------------------------------+------------+
  |   9  NVIDIA RTX 6000 Ada Ge...  | 00000000:D6:00.0             |   0%       |
  +---------------------------------+------------------------------+------------+

  Notes:
  - The biggest red flag here is that openstack seems to believe that the instance has 1x24Q and 2x8Q, this is entirely false, the instance only sees 2 8Q.
  - NVIDA-SMI VGPU command reports correctly on the current state, only 2x8Q VFs are used.
  - mdev is not enabled here, which is the traditional, supported route

  I want to emphasize that if I change the order of my instance
  spawning, I can get the correct configuration. If I start with 8x48Q
  profiles, then 6x8Q profiles, followed by 2x24Q profiles, it spawns
  perfectly:

  openstack server list
  +--------------------------------------+------------------+-------------------+---------------------+--------------------------+---------------+
  | ID                                   | Name             | Status            | Networks            | Image                    | Flavor        |
  +--------------------------------------+------------------+-------------------+---------------------+--------------------------+---------------+
  | c439a08f-6b5e-42f7-b6ff-dfcb03a64176 | 2xRTX-ADA-24Q    | ACTIVE            | inter=192.168.0.82  | N/A (booted from volume) | 2xRTX-ADA-24Q |
  | ae11ce9e-fa05-417b-be3a-15404b8da9e3 | 6xRTX-ADA-8Q     | ACTIVE            | inter=192.168.0.107 | N/A (booted from volume) | 6xRTX-ADA-8Q  |
  | fe09bf4a-cd87-4916-9b8d-18ceaed50c92 | 8xRTX-ADA-48Q    | ACTIVE            | inter=192.168.0.159 | N/A (booted from volume) | 8xRTX-ADA-48Q |
  | c96e76a9-b122-45e3-b7fa-111be3d90922 | Win11-vm         | SHUTOFF           | inter=192.168.0.142 | N/A (booted from volume) | m1.medium     |
  | b7417e4f-a647-493b-9cb6-6f76a73e7a9a | Ubuntu 24.04 LTS | SHELVED_OFFLOADED | dmz=     | N/A (booted from volume) | m1.tiny       |
  +--------------------------------------+------------------+-------------------+---------------------+--------------------------+---------------+

  openstack resource provider allocation show c439a08f-6b5e-42f7-b6ff-dfcb03a64176
  +--------------------------------------+------------+--------------------------------------+----------------------------------+----------------------------------+
  | resource_provider                    | generation | resources                            | project_id                       | user_id                          |
  +--------------------------------------+------------+--------------------------------------+----------------------------------+----------------------------------+
  | 8b21748f-d43e-48b7-b3ca-46d565c819ce |        225 | {'VCPU': 4, 'MEMORY_MB': 4096}       | 073808578a7e4e8aa0ceebd1a69b34a6 | 25c2808d552741ce849e4fd9b320065b |
  | 19a368c6-52fe-4bf6-a6e6-19bd45f1538c |         45 | {'CUSTOM_NVIDIA_RTX6000_ADA_24Q': 2} | 073808578a7e4e8aa0ceebd1a69b34a6 | 25c2808d552741ce849e4fd9b320065b |
  +--------------------------------------+------------+--------------------------------------+----------------------------------+----------------------------------+

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2114947/+subscriptions



References