← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2093879] [NEW] resource tracker raises PlacementPCIException for every InventoryInUser error coming from placement via update_from_provider_tree()

 

Public bug reported:

The code in resource tracker in _update_to_placement()

            self.reportclient.update_from_provider_tree(
                context,
                prov_tree,
                allocations=allocs if driver_reshaped or pci_reshaped else None
            )
        except exception.InventoryInUse as e:
            # This means an inventory reconfiguration (e.g.: removing a parent
            # PF and adding a VF under that parent) was not possible due to
            # existing allocations. Translate the exception to prevent the
            # compute service to start
            raise exception.PlacementPciException(error=str(e))

https://github.com/openstack/nova/blob/a459467899d2b406aa8cf530ae481255eaf3c957/nova/compute/resource_tracker.py#L1360-L1370

translates any InventoryInUse exception from placement inventory update
as a PCI related error and stops the nova-compute startup process. The
bug report https://bugs.launchpad.net/nova/+bug/2093869/ shows a case
when such assumption is not valid. If compute is configured with
image_type rbd and the ceph reports 0 disk_gb for the configured pools
then nova-compute will not report DISK_GB to placement in the compute
inventory and placement assumes that it means such inventory is deleted.
If there are usage for such inventory the placement will reject the
update with InventoryInUse error.

The generated error message is highly misleading mixing PCI in Placement
and DISK_GB:

Jan 13 07:33:20 devstack-jan nova-compute[65544]: ERROR oslo_service.service [None req-de66825f-768a-4951-a57b-c92f9f035255 None None] Error starting thread.: nova.exception.PlacementPciException: Failed to gather or report PCI resources to Placement: Th
ere was a conflict when trying to complete your request.
Jan 13 07:33:20 devstack-jan nova-compute[65544]: update conflict: Inventory for 'DISK_GB' on resource provider '5cdfba85-3122-49ea-b7b1-1fd0af461588' in use.

So this exception handling needs to be made independent from PCI in
Placement.

** Affects: nova
     Importance: Medium
         Status: New

** Changed in: nova
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2093879

Title:
  resource tracker raises PlacementPCIException for every
  InventoryInUser error coming from placement via
  update_from_provider_tree()

Status in OpenStack Compute (nova):
  New

Bug description:
  The code in resource tracker in _update_to_placement()

              self.reportclient.update_from_provider_tree(
                  context,
                  prov_tree,
                  allocations=allocs if driver_reshaped or pci_reshaped else None
              )
          except exception.InventoryInUse as e:
              # This means an inventory reconfiguration (e.g.: removing a parent
              # PF and adding a VF under that parent) was not possible due to
              # existing allocations. Translate the exception to prevent the
              # compute service to start
              raise exception.PlacementPciException(error=str(e))

  https://github.com/openstack/nova/blob/a459467899d2b406aa8cf530ae481255eaf3c957/nova/compute/resource_tracker.py#L1360-L1370

  translates any InventoryInUse exception from placement inventory
  update as a PCI related error and stops the nova-compute startup
  process. The bug report https://bugs.launchpad.net/nova/+bug/2093869/
  shows a case when such assumption is not valid. If compute is
  configured with image_type rbd and the ceph reports 0 disk_gb for the
  configured pools then nova-compute will not report DISK_GB to
  placement in the compute inventory and placement assumes that it means
  such inventory is deleted. If there are usage for such inventory the
  placement will reject the update with InventoryInUse error.

  The generated error message is highly misleading mixing PCI in
  Placement and DISK_GB:

  Jan 13 07:33:20 devstack-jan nova-compute[65544]: ERROR oslo_service.service [None req-de66825f-768a-4951-a57b-c92f9f035255 None None] Error starting thread.: nova.exception.PlacementPciException: Failed to gather or report PCI resources to Placement: Th
  ere was a conflict when trying to complete your request.
  Jan 13 07:33:20 devstack-jan nova-compute[65544]: update conflict: Inventory for 'DISK_GB' on resource provider '5cdfba85-3122-49ea-b7b1-1fd0af461588' in use.

  So this exception handling needs to be made independent from PCI in
  Placement.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2093879/+subscriptions