yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96456
[Bug 2125445] Re: Nova/Placement ignores the flavor’s trait constraints when scheduling SR-IOV vGPU devices.
updated using pull & deploy.. seems to work after removing and re-adding
a vm
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2125445
Title:
Nova/Placement ignores the flavor’s trait constraints when scheduling
SR-IOV vGPU devices.
Status in OpenStack Compute (nova):
Invalid
Bug description:
# Nova/Placement allocates **more** SR-IOV vGPU units than requested
(PCI-in-Placement)
## Summary
When requesting **one** vGPU VF via a flavor extra spec
(`pci_passthrough:alias=rtx6000-ada-8q:1`), Nova’s scheduler selects a
valid single-device candidate, but the final Placement allocations for
the server end up with **two** `CUSTOM_NVIDIA_RTX6000_ADA_8Q` units on
the same resource provider (RP). In-guest `nvidia-smi` also shows >1
VF attached.
This looks like the “extra PCI hostdevs assigned” behavior recently
fixed upstream (LP #2098496) when **both** `[filter_scheduler]
pci_in_placement = true` and `[pci] report_in_placement = true` are
enabled. ([OpenStack Docs][1])
---
## Environment
* **OpenStack**: Epoxy **2025.1** (Kolla-Ansible, containers show 2025.1 tags)
* **Deployment**: All-in-one (controller + compute on same host)
* **Hypervisor node**: Ubuntu 24.04
* **GPU stack**: NVIDIA vGPU **570.148**; 10× RTX 6000 Ada; SR-IOV mode; each PF exposes VFs whose `current_vgpu_type` are set to the profiles below prior to boot
* **Nova PCI-in-Placement**: enabled end-to-end (API / Scheduler / Conductor / Compute)
Upstream context for PCI-in-Placement: spec and admin docs.
([OpenStack Specifications][2])
---
## Configuration (effective inside running containers)
**`nova-compute` (`/etc/nova/nova.conf`)**
```ini
[pci]
report_in_placement = true
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.6", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:00.7", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:01.0", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:4f:01.1", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:52:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:53:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:56:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:57:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:ce:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d1:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d2:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.6", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:00.7", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:01.0", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d5:01.1", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d6:00.4", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "managed":"no" }
device_spec = { "vendor_id":"10de", "product_id":"26b1", "address":"0000:d6:00.5", "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "managed":"no" }
alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "device_type":"type-VF", "name":"rtx6000-ada-48q" }
alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "device_type":"type-VF", "name":"rtx6000-ada-8q" }
alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "device_type":"type-VF", "name":"rtx6000-ada-24q" }
[filter_scheduler]
pci_in_placement = true
enabled_filters = ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
[scheduler]
allocation_candidate_request_method = post
max_placement_results = 128
[DEFAULT]
allow_resize_to_same_host = true
block_device_allocate_retries = 700
[libvirt]
volume_use_multipath = True
[compute]
volume_attach_retry_count = 70
volume_attach_retry_interval = 7
```
**`nova-api.conf`** (aliases)
```ini
[pci]
alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_48Q", "device_type":"type-VF", "name":"rtx6000-ada-48q" }
alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_8Q", "device_type":"type-VF", "name":"rtx6000-ada-8q" }
alias = { "resource_class":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "traits":"CUSTOM_NVIDIA_RTX6000_ADA_24Q", "device_type":"type-VF", "name":"rtx6000-ada-24q" }
```
**`placement.conf`**
```ini
[api]
placement_log_debug = true
[placement]
max_allocation_candidates = 1024
allocation_candidates_generation_strategy = breadth-first
```
---
## Flavor used (requests exactly **1** VF via alias)
```bash
openstack flavor create g1.8q --private \
--ram 4096 --vcpu 4 --disk 0 \
--property "pci_passthrough:alias"="rtx6000-ada-8q:1"
openstack flavor set --project admin g1.8q
```
Per docs, requesting PCI devices via **alias** in the flavor extra
spec is the supported method. ([OpenStack Docs][3])
---
## Provider tree snapshot (abridged)
```text
openstack resource provider list
+--------------------------------------+----------------------+-----------+
| uuid | name | generation|
+--------------------------------------+----------------------+-----------+
| 8b21748f-d43e-... | LBRN-HV | 499 |
| 739909f6-99c7-... | LBRN-HV_0000:4F:00.0 | 281 |
| 6a9bab9a-c78a-... | LBRN-HV_0000:D5:00.0 | 109 |
... (one RP per physical device, as expected) ...
+--------------------------------------+----------------------+-----------+
```
---
## Reproducer
1. Boot a volume-backed Ubuntu image (QEMU) with NVIDIA guest drivers preinstalled.
2. Create server with `g1.8q` flavor (`pci_passthrough:alias=rtx6000-ada-8q:1`).
3. Observe **in-guest** that `nvidia-smi` shows **>1** `rtx6000-ada-8q` VF.
4. Delete and recreate a few times; occasionally the allocations jump from 1 → 2 (and I have also seen 3 earlier).
---
## What the **scheduler** is doing (debug)
From `nova-scheduler` at the time of a failing build (UUID:
`12aee082-eb91-4dde-addb-845f00df88a4`):
```
... PciPassthroughFilter tries allocation candidate:
{'allocations': {'6a9bab9a-...': {'resources': {'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 1}},
'8b21748f-...': {'resources': {'VCPU': 8, 'MEMORY_MB': 32768}}}, ...}
... PciPassthroughFilter accepted allocation candidate: ... 'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 1 ...
... Attempting to claim resources in the placement API for instance 12aee082-...
... Selected host: (LBRN-HV, LBRN-HV) ... allocation_candidates: 2
```
So the filter path is correct and the scheduler claims **one** `8Q`.
(This matches the intended PCI-in-Placement request flow.
([OpenDev][4]))
---
## What **Placement** shows afterwards (over-allocation)
Immediately after build:
```bash
# Candidates for 1 unit:
openstack --os-placement-api-version 1.39 \
allocation candidate list \
--resource CUSTOM_NVIDIA_RTX6000_ADA_8Q=1 \
--required CUSTOM_NVIDIA_RTX6000_ADA_8Q
+---+--------------------------------+--------------------------------------+----------------------------------+---------------------------------------------------------+
| # | allocation | resource provider | inventory used/capacity | traits |
+---+--------------------------------+--------------------------------------+----------------------------------+---------------------------------------------------------+
| 1 | CUSTOM_NVIDIA_RTX6000_ADA_8Q=1 | 739909f6-... | CUSTOM_NVIDIA_RTX6000_ADA_8Q=0/6 | COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_8Q |
| 2 | CUSTOM_NVIDIA_RTX6000_ADA_8Q=1 | 6a9bab9a-... | CUSTOM_NVIDIA_RTX6000_ADA_8Q=2/6 | COMPUTE_MANAGED_PCI_DEVICE,CUSTOM_NVIDIA_RTX6000_ADA_8Q |
+---+--------------------------------+--------------------------------------+----------------------------------+---------------------------------------------------------+
```
([OpenStack Docs][5])
And the **server’s allocations**:
```bash
openstack --os-placement-api-version 1.12 \
resource provider allocation show 12aee082-eb91-4dde-addb-845f00df88a4
+--------------------------------------+------------+---------------------------------------------+
| resource_provider | generation | resources |
+--------------------------------------+------------+---------------------------------------------+
| 6a9bab9a-c78a-4b6c-9d3a-ddc3aec6d9b0 | 113 | {'CUSTOM_NVIDIA_RTX6000_ADA_8Q': 2} |
| 8b21748f-d43e-48b7-b3ca-46d565c819ce | 503 | {'VCPU': 8, 'MEMORY_MB': 32768} |
+--------------------------------------+------------+---------------------------------------------+
```
Note the **2** units assigned on RP `6a9bab9a-...` despite the flavor
asking for **1**.
---
## Expected vs Observed
* **Expected**: One `CUSTOM_NVIDIA_RTX6000_ADA_8Q` allocation & one VF attached, per the alias request. ([OpenStack Docs][3])
* **Observed**: Placement allocations show **two** units; guest sees >1 VF.
---
## Analysis / relation to upstream fix
This is indistinguishable from **LP #2098496** (“VM gets more PCI
hostdevs than requested when PCI in Placement is enabled with VFs”),
which the Nova team fixed in the **2024.1/2024.2/2025.1** series. The
2025.1 release notes explicitly call out the bug and scope: it only
affects systems with **both** `[filter_scheduler] pci_in_placement =
true` and `[pci] report_in_placement = true` set to `True` — exactly
this deployment. ([OpenStack Docs][6])
The root cause described in that report was a mismatch in how
pools/devices are correlated when PCI-in-Placement is enabled, leading
to an extra hostdev being consumed/allocated. ([Launchpad][7])
---
## Extra validation that traits/aliases are used correctly
* I am **not** requesting devices directly via `resources:*` flavor
extra specs. Device requests are **only** via `pci_passthrough:alias`.
This is the recommended approach for PCI passthrough requests; Nova
translates the alias to the appropriate resource class and traits.
([OpenStack Docs][3])
* One RP per device is reported with correct inventory (see provider
list above). Scheduler debug clearly shows `PciPassthroughFilter`
testing/accepting candidates with `CUSTOM_NVIDIA_RTX6000_ADA_8Q: 1`,
and only **after** build do we see `2` in Placement for the consumer.
---
## Ask
* Please confirm whether this is a recurrence of **LP #2098496** in
the Epoxy/Kolla 2025.1 images I’m running (pre-fix build?), or a new
edge case. I can provide the Nova package version strings from each
container on request.
---
### References
* PCI device tracking in Placement (Nova spec). ([OpenStack Specifications][2])
* Nova admin: PCI passthrough & alias-based requests. ([OpenStack Docs][3])
* Nova release notes 2024.1 / 2024.2 / **2025.1**: fix for “more PCI hostdevs than requested” (LP #2098496). ([OpenStack Docs][6])
* Original Launchpad bug (root cause discussion). ([Launchpad][7])
---
If you want, I can also attach a tiny script that diffs RP **usage**
before/after a single boot (via `openstack resource provider usage
show <rp>`), so reviewers can see the +2 jump numerically.
[1]: https://docs.openstack.org/releasenotes/nova/2025.1.html "2025.1 Series Release Notes - Nova"
[2]: https://specs.openstack.org/openstack/nova-specs/specs/zed/approved/pci-device-tracking-in-placement.html "PCI Device Tracking In Placement — Nova Specs ..."
[3]: https://docs.openstack.org/nova/2025.1/admin/pci-passthrough.html "Attaching physical PCI devices to guests"
[4]: https://static.opendev.org/docs/placement/unmaintained/2023.1/user/index.html?utm_source=chatgpt.com "Placement Usage - nova-scheduler"
[5]: https://docs.openstack.org/osc-placement/latest/cli/index.html "Command Line Reference — osc-placement 4.7.1.dev1 ..."
[6]: https://docs.openstack.org/releasenotes/nova/2024.1.html "2024.1 Series Release Notes - nova"
[7]: https://bugs.launchpad.net/bugs/2098496 "Bug #2098496 “VM gets more PCI hostdevs than requested ..."
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2125445/+subscriptions
References