yahoo-eng-team team mailing list archive

Thread
Date
[Bug 2125445] Re: Nova/Placement ignores the flavor’s trait constraints when scheduling SR-IOV vGPU devices.

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: david <2125445@xxxxxxxxxxxxxxxxxx>
Date: Tue, 23 Sep 2025 03:53:28 -0000
Reply-to: Bug 2125445 <2125445@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
updated using pull & deploy.. seems to work after removing and re-adding
a vm

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2125445

Title:
  Nova/Placement ignores the flavor’s trait constraints when scheduling
  SR-IOV vGPU devices.

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  # Nova/Placement allocates **more** SR-IOV vGPU units than requested
  (PCI-in-Placement)

  ## Summary

  When requesting **one** vGPU VF via a flavor extra spec
  (`pci_passthrough:alias=rtx6000-ada-8q:1`), Nova’s scheduler selects a
  valid single-device candidate, but the final Placement allocations for
  the server end up with **two** `CUSTOM_NVIDIA_RTX6000_ADA_8Q` units on
  the same resource provider (RP). In-guest `nvidia-smi` also shows >1
  VF attached.

  This looks like the “extra PCI hostdevs assigned” behavior recently
  fixed upstream (LP #2098496) when **both** `[filter_scheduler]
  pci_in_placement = true` and `[pci] report_in_placement = true` are
  enabled. ([OpenStack Docs][1])

  ---

  ## Environment

  * **OpenStack**: Epoxy **2025.1** (Kolla-Ansible, containers show 2025.1 tags)
  * **Deployment**: All-in-one (controller + compute on same host)
  * **Hypervisor node**: Ubuntu 24.04
  * **GPU stack**: NVIDIA vGPU **570.148**; 10× RTX 6000 Ada; SR-IOV mode; each PF exposes VFs whose `current_vgpu_type` are set to the profiles below prior to boot
  * **Nova PCI-in-Placement**: enabled end-to-end (API / Scheduler / Conductor / Compute)

  Upstream context for PCI-in-Placement: spec and admin docs.
  ([OpenStack Specifications][2])

  ---

  ## Configuration (effective inside running containers)

  **`nova-compute` (`/etc/nova/nova.conf`)**

  ```ini
  [pci]
  report_in_placement = true
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.6", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.7", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:01.0", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:01.1", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }

  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:52:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:53:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:56:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:57:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:ce:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d1:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d2:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }

  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.6", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.7", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:01.0", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:01.1", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "managed":"no" }

  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d6:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "managed":"no" }
  device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d6:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "managed":"no" }

  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "device_type":"type-VF", "name":"rtx6000-ada-48q" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "device_type":"type-VF", "name":"rtx6000-ada-8q" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "device_type":"type-VF", "name":"rtx6000-ada-24q" }

  [filter_scheduler]
  pci_in_placement = true
  enabled_filters = ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
  available_filters = nova.scheduler.filters.all_filters

  [scheduler]
  allocation_candidate_request_method = post
  max_placement_results = 128

  [DEFAULT]
  allow_resize_to_same_host = true
  block_device_allocate_retries = 700

  [libvirt]
  volume_use_multipath = True

  [compute]
  volume_attach_retry_count = 70
  volume_attach_retry_interval = 7
  ```

  **`nova-api.conf`** (aliases)

  ```ini
  [pci]
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "device_type":"type-VF", "name":"rtx6000-ada-48q" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q",  "device_type":"type-VF", "name":"rtx6000-ada-8q" }
  alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "device_type":"type-VF", "name":"rtx6000-ada-24q" }
  ```

  **`placement.conf`**

  ```ini
  [api]
  placement_log_debug = true
  [placement]
  max_allocation_candidates = 1024
  allocation_candidates_generation_strategy = breadth-first
  ```

  ---

  ## Flavor used (requests exactly **1** VF via alias)

  ```bash
  openstack flavor create g1.8q --private \
    --ram 4096 --vcpu 4 --disk 0 \
    --property "pci_passthrough:alias"="rtx6000-ada-8q:1"
  openstack flavor set --project admin g1.8q
  ```

  Per docs, requesting PCI devices via **alias** in the flavor extra
  spec is the supported method. ([OpenStack Docs][3])

  ---

  ## Provider tree snapshot (abridged)

  ```text
  openstack resource provider list
  +--------------------------------------+----------------------+-----------+
  | uuid                                 | name                 | generation|
  +--------------------------------------+----------------------+-----------+
  | 8b21748f-d43e-... | LBRN-HV              | 499 |
  | 739909f6-99c7-... | LBRN-HV_0000:4F:00.0 | 281 |
  | 6a9bab9a-c78a-... | LBRN-HV_0000:D5:00.0 | 109 |
  ... (one RP per physical device, as expected) ...
  +--------------------------------------+----------------------+-----------+
  ```

  ---

  ## Reproducer

  1. Boot a volume-backed Ubuntu image (QEMU) with NVIDIA guest drivers preinstalled.
  2. Create server with `g1.8q` flavor (`pci_passthrough:alias=rtx6000-ada-8q:1`).
  3. Observe **in-guest** that `nvidia-smi` shows **>1** `rtx6000-ada-8q` VF.
  4. Delete and recreate a few times; occasionally the allocations jump from 1 → 2 (and I have also seen 3 earlier).

  ---

  ## What the **scheduler** is doing (debug)

  From `nova-scheduler` at the time of a failing build (UUID:
  `12aee082-eb91-4dde-addb-845f00df88a4`):

  ```
  ... PciPassthroughFilter tries allocation candidate:
    {'allocations': {'6a9bab9a-...': {'resources': {'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 1}},
                     '8b21748f-...': {'resources': {'VCPU': 8, 'MEMORY_MB': 32768}}}, ...}

  ... PciPassthroughFilter accepted allocation candidate: ... 'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 1 ...
  ... Attempting to claim resources in the placement API for instance 12aee082-...
  ... Selected host: (LBRN-HV, LBRN-HV) ... allocation_candidates: 2
  ```

  So the filter path is correct and the scheduler claims **one** `8Q`.
  (This matches the intended PCI-in-Placement request flow.
  ([OpenDev][4]))

  ---

  ## What **Placement** shows afterwards (over-allocation)

  Immediately after build:

  ```bash
  # Candidates for 1 unit:
  openstack --os-placement-api-version 1.39 \
    allocation candidate list \
    --resource CUSTOM_NVIDIA_RTX6000_ADA_8Q=1 \
    --required CUSTOM_NVIDIA_RTX6000_ADA_8Q
  +---+--------------------------------+--------------------------------------+----------------------------------+---------------------------------------------------------+
  | # | allocation                     | resource provider                    | inventory used/capacity          | traits                                                  |
  +---+--------------------------------+--------------------------------------+----------------------------------+---------------------------------------------------------+
  | 1 | CUSTOM_NVIDIA_RTX6000_ADA_8Q=1 | 739909f6-...                         | CUSTOM_NVIDIA_RTX6000_ADA_8Q=0/6 | COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_8Q |
  | 2 | CUSTOM_NVIDIA_RTX6000_ADA_8Q=1 | 6a9bab9a-...                         | CUSTOM_NVIDIA_RTX6000_ADA_8Q=2/6 | COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_8Q |
  +---+--------------------------------+--------------------------------------+----------------------------------+---------------------------------------------------------+
  ```

  ([OpenStack Docs][5])

  And the **server’s allocations**:

  ```bash
  openstack --os-placement-api-version 1.12 \
    resource provider allocation show 12aee082-eb91-4dde-addb-845f00df88a4
  +--------------------------------------+------------+---------------------------------------------+
  | resource_provider                    | generation | resources                                   |
  +--------------------------------------+------------+---------------------------------------------+
  | 6a9bab9a-c78a-4b6c-9d3a-ddc3aec6d9b0 |        113 | {'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 2}         |
  | 8b21748f-d43e-48b7-b3ca-46d565c819ce |        503 | {'VCPU': 8, 'MEMORY_MB': 32768}             |
  +--------------------------------------+------------+---------------------------------------------+
  ```

  Note the **2** units assigned on RP `6a9bab9a-...` despite the flavor
  asking for **1**.

  ---

  ## Expected vs Observed

  * **Expected**: One `CUSTOM_NVIDIA_RTX6000_ADA_8Q` allocation & one VF attached, per the alias request. ([OpenStack Docs][3])
  * **Observed**: Placement allocations show **two** units; guest sees >1 VF.

  ---

  ## Analysis / relation to upstream fix

  This is indistinguishable from **LP #2098496** (“VM gets more PCI
  hostdevs than requested when PCI in Placement is enabled with VFs”),
  which the Nova team fixed in the **2024.1/2024.2/2025.1** series. The
  2025.1 release notes explicitly call out the bug and scope: it only
  affects systems with **both** `[filter_scheduler] pci_in_placement =
  true` and `[pci] report_in_placement = true` set to `True` — exactly
  this deployment. ([OpenStack Docs][6])

  The root cause described in that report was a mismatch in how
  pools/devices are correlated when PCI-in-Placement is enabled, leading
  to an extra hostdev being consumed/allocated. ([Launchpad][7])

  ---

  ## Extra validation that traits/aliases are used correctly

  * I am **not** requesting devices directly via `resources:*` flavor
  extra specs. Device requests are **only** via `pci_passthrough:alias`.
  This is the recommended approach for PCI passthrough requests; Nova
  translates the alias to the appropriate resource class and traits.
  ([OpenStack Docs][3])

  * One RP per device is reported with correct inventory (see provider
  list above). Scheduler debug clearly shows `PciPassthroughFilter`
  testing/accepting candidates with `CUSTOM_NVIDIA_RTX6000_ADA_8Q: 1`,
  and only **after** build do we see `2` in Placement for the consumer.

  ---

  ## Ask

  * Please confirm whether this is a recurrence of **LP #2098496** in
  the Epoxy/Kolla 2025.1 images I’m running (pre-fix build?), or a new
  edge case. I can provide the Nova package version strings from each
  container on request.

  ---

  ### References

  * PCI device tracking in Placement (Nova spec). ([OpenStack Specifications][2])
  * Nova admin: PCI passthrough & alias-based requests. ([OpenStack Docs][3])
  * Nova release notes 2024.1 / 2024.2 / **2025.1**: fix for “more PCI hostdevs than requested” (LP #2098496). ([OpenStack Docs][6])
  * Original Launchpad bug (root cause discussion). ([Launchpad][7])

  ---

  If you want, I can also attach a tiny script that diffs RP **usage**
  before/after a single boot (via `openstack resource provider usage
  show <rp>`), so reviewers can see the +2 jump numerically.

  [1]: https://docs.openstack.org/releasenotes/nova/2025.1.html "2025.1 Series Release Notes - Nova"
  [2]: https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html "PCI Device Tracking In Placement — Nova Specs ..."
  [3]: https://docs.openstack.org/nova/2025.1/admin/pci-passthrough.html "Attaching physical PCI devices to guests"
  [4]: https://static.opendev.org/docs/placement/unmaintained/2023.1/user/index.html?utm_source=chatgpt.com "Placement Usage - nova-scheduler"
  [5]: https://docs.openstack.org/osc-placement/latest/cli/index.html "Command Line Reference — osc-placement 4.7.1.dev1 ..."
  [6]: https://docs.openstack.org/releasenotes/nova/2024.1.html "2024.1 Series Release Notes - nova"
  [7]: https://bugs.launchpad.net/bugs/2098496 "Bug #2098496 “VM gets more PCI hostdevs than requested ..."

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2125445/+subscriptions
References

[Bug 2125445] [NEW] Nova/Placement ignores the flavor’s trait constraints when scheduling SR-IOV vGPU devices.
From: david, 2025-09-22