yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #48926
[Bug 1565785] [NEW] SR-IOV PF passthrough device claiming/allocation does not work for physical functions devices
Public bug reported:
Enable PCI passthrough on a compute host (whitelist devices explained in
more detail in the docs), and create a network, subnet and a port that
represents a SR-IOV physical function passthrough:
$ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net
$ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne
$ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf
After that try to boot an instance using the created port (provided the
pci_passthrough_whitelist was setup correctly) this should work:
$ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm
My test env has 2 PFs with 7 VFs each, after spawning an instance, the
PF gets marked as allocated, but non of the VFs do, even though they are
removed from the host (note that device_pools are correctly updated.
So after the instance was successfully booted we get
MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0;
+----------+
| count(*) |
+----------+
| 15 |
+----------+
# This should be 8 - we are leaking 7 VFs belonging to the attached PF
that never get updated.
MariaDB [nova]> select pci_stats from compute_nodes;
| pci_stats
| {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": [{"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_
node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_object.data": {"count": 1, "numa_node": 0, "vendor_id": "8086", "product_id": "1521", "tags": {"dev_type": "type-PF", "physical
_network": "phynet"}}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_
object.data": {"count": 7, "numa_node": 0, "vendor_id": "8086", "product_id": "1520", "tags": {"dev_type": "type-VF", "physical_network": "phynet"}}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "n
ova"} |
This is correct - shows 8 available devices
Once a new resource_tracker run happens we hit
https://bugs.launchpad.net/nova/+bug/1565721 so we stop updating based
on what is found on the host.
The root cause of this is (I believe) that we update PCI objects in the
local scope, but never call save() on those particular instances. So we
grap and update the status here:
https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/objects/pci_device.py#L339-L349
but never call save inside that method.
The save is eventually called here referencing completely different
instances that never see the update:
https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/compute/resource_tracker.py#L646
** Affects: nova
Importance: High
Status: New
** Tags: pci
** Changed in: nova
Importance: Undecided => High
** Tags added: pci
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1565785
Title:
SR-IOV PF passthrough device claiming/allocation does not work for
physical functions devices
Status in OpenStack Compute (nova):
New
Bug description:
Enable PCI passthrough on a compute host (whitelist devices explained
in more detail in the docs), and create a network, subnet and a port
that represents a SR-IOV physical function passthrough:
$ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net
$ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne
$ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf
After that try to boot an instance using the created port (provided
the pci_passthrough_whitelist was setup correctly) this should work:
$ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm
My test env has 2 PFs with 7 VFs each, after spawning an instance, the
PF gets marked as allocated, but non of the VFs do, even though they
are removed from the host (note that device_pools are correctly
updated.
So after the instance was successfully booted we get
MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0;
+----------+
| count(*) |
+----------+
| 15 |
+----------+
# This should be 8 - we are leaking 7 VFs belonging to the attached PF
that never get updated.
MariaDB [nova]> select pci_stats from compute_nodes;
| pci_stats
| {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": [{"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_
node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_object.data": {"count": 1, "numa_node": 0, "vendor_id": "8086", "product_id": "1521", "tags": {"dev_type": "type-PF", "physical
_network": "phynet"}}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_
object.data": {"count": 7, "numa_node": 0, "vendor_id": "8086", "product_id": "1520", "tags": {"dev_type": "type-VF", "physical_network": "phynet"}}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "n
ova"} |
This is correct - shows 8 available devices
Once a new resource_tracker run happens we hit
https://bugs.launchpad.net/nova/+bug/1565721 so we stop updating based
on what is found on the host.
The root cause of this is (I believe) that we update PCI objects in
the local scope, but never call save() on those particular instances.
So we grap and update the status here:
https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/objects/pci_device.py#L339-L349
but never call save inside that method.
The save is eventually called here referencing completely different
instances that never see the update:
https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/compute/resource_tracker.py#L646
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1565785/+subscriptions
Follow ups