yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88687
[Bug 1969496] [NEW] booting with PCI device fails: Attempt to consume PCI device xxx from empty pool
Public bug reported:
We saw in the field that the pci_devices table can end up in inconsistent state after a compute node HW failure and re-deployment. There could be dependent devices where the parent PF is in available state while the children VFs are in unavailable state. (Before the HW fault the PF was allocated hence the VFs was marked unavailable).
In this state this PF is still schedulable but during the PCI claim the handling of dependent devices in the PCI tracker will fail with the error: "Attempt to consume PCI device XXX from empty pool".
The reason of the failure is that when the PF is claimed, all the children VFs are marked unavailable. But if the VF is already unavailable such step fails.
There is no reproducer found so far that generates the inconsistent
state. (We tried whitelist reconfiguration, evacuation, VM delete while
the compute was down) But recovering from the inconsistency should be
possible.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1969496
Title:
booting with PCI device fails: Attempt to consume PCI device xxx from
empty pool
Status in OpenStack Compute (nova):
New
Bug description:
We saw in the field that the pci_devices table can end up in inconsistent state after a compute node HW failure and re-deployment. There could be dependent devices where the parent PF is in available state while the children VFs are in unavailable state. (Before the HW fault the PF was allocated hence the VFs was marked unavailable).
In this state this PF is still schedulable but during the PCI claim the handling of dependent devices in the PCI tracker will fail with the error: "Attempt to consume PCI device XXX from empty pool".
The reason of the failure is that when the PF is claimed, all the children VFs are marked unavailable. But if the VF is already unavailable such step fails.
There is no reproducer found so far that generates the inconsistent
state. (We tried whitelist reconfiguration, evacuation, VM delete
while the compute was down) But recovering from the inconsistency
should be possible.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1969496/+subscriptions
Follow ups