← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1969496] Re: booting with PCI device fails: Attempt to consume PCI device xxx from empty pool

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/838555
Committed: https://opendev.org/openstack/nova/commit/3af2ecc13fa9334de8418accaed4fffefefb41da
Submitter: "Zuul (22348)"
Branch:    master

commit 3af2ecc13fa9334de8418accaed4fffefefb41da
Author: Balazs Gibizer <gibi@xxxxxxxxxx>
Date:   Tue Apr 19 18:36:50 2022 +0200

    Allow claiming PCI PF if child VF is unavailable
    
    As If9ab424cc7375a1f0d41b03f01c4a823216b3eb8 stated there is a way for
    the pci_device table to become inconsistent. Parent PF can be in
    'available' state while children VFs are still in 'unavailable' state.
    In this situation the PF is schedulable but the PCI claim will fail
    when try to mark the dependent VFs unavailable.
    
    This patch changes the PCI claim logic to allow claiming the parent PF
    in the inconsistent situation as we assume that it is safe to do so.
    This claim also fixed the inconsistency so that when the parent PF is
    freed the children VFs become available again.
    
    Closes-Bug: #1969496
    Change-Id: I575ce06bcc913add7db0849f85728371da2032fc


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1969496

Title:
  booting with PCI device fails: Attempt to consume PCI device xxx from
  empty pool

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  We saw in the field that the pci_devices table can end up in inconsistent state after a compute node HW failure and re-deployment. There could be dependent devices where the parent PF is in available state while the children VFs are in unavailable state. (Before the HW fault the PF was allocated hence the VFs was marked unavailable).
      
  In this state this PF is still schedulable but during the PCI claim the handling of dependent devices in the PCI tracker will fail with the error: "Attempt to consume PCI device XXX from empty pool".
      
  The reason of the failure is that when the PF is claimed, all the children VFs are marked unavailable. But if the VF is already unavailable such step fails.

  There is no reproducer found so far that generates the inconsistent
  state. (We tried whitelist reconfiguration, evacuation, VM delete
  while the compute was down) But recovering from the inconsistency
  should be possible.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1969496/+subscriptions



References