← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1986838] [NEW] Booting with two identical PCI aliases on a host with a single matching dev succeeds but the instance will have no PCI allocations

 

Public bug reported:

Detected during reading the code.

Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.

Expected result
The instance fails to schedule

Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device request [InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]), InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])] failed

The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore.

I think the root cause of the fault is that the
PciDeviceStats.support_requests() [1] call matches each
InstancePCIRequest object independently to the available PCI pools and
does not update the status of the pools locally.

I will push a functional reproduction test shortly.

[1]
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: pci

** Tags added: pci

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1986838

Title:
  Booting with two identical PCI aliases on a host with a single
  matching dev succeeds but the instance will have no PCI allocations

Status in OpenStack Compute (nova):
  New

Bug description:
  Detected during reading the code.

  Reproduction
  1) configure a host with a single PCI passthrough device
  2) configure two PCI aliases (a1, a2) with different names but each matching the above device
  3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.

  Expected result
  The instance fails to schedule

  Actual result
  The instance schedules to the host but has no PCI allocations
  The nova scheduler logs:
  Selected host: compute1 failed to consume from instance. Error: PCI device request [InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]), InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])] failed

  The nova compute logs:
  Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore.

  I think the root cause of the fault is that the
  PciDeviceStats.support_requests() [1] call matches each
  InstancePCIRequest object independently to the available PCI pools and
  does not update the status of the pools locally.

  I will push a functional reproduction test shortly.

  [1]
  https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1986838/+subscriptions



Follow ups