yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89520
[Bug 1986838] [NEW] Booting with two identical PCI aliases on a host with a single matching dev succeeds but the instance will have no PCI allocations
Public bug reported:
Detected during reading the code.
Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.
Expected result
The instance fails to schedule
Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device request [InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]), InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])] failed
The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore.
I think the root cause of the fault is that the
PciDeviceStats.support_requests() [1] call matches each
InstancePCIRequest object independently to the available PCI pools and
does not update the status of the pools locally.
I will push a functional reproduction test shortly.
[1]
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645
** Affects: nova
Importance: Undecided
Status: New
** Tags: pci
** Tags added: pci
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1986838
Title:
Booting with two identical PCI aliases on a host with a single
matching dev succeeds but the instance will have no PCI allocations
Status in OpenStack Compute (nova):
New
Bug description:
Detected during reading the code.
Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.
Expected result
The instance fails to schedule
Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device request [InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]), InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])] failed
The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore.
I think the root cause of the fault is that the
PciDeviceStats.support_requests() [1] call matches each
InstancePCIRequest object independently to the available PCI pools and
does not update the status of the pools locally.
I will push a functional reproduction test shortly.
[1]
https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1986838/+subscriptions
Follow ups