yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #51780
[Bug 1565785] Fix merged to nova (master)
Reviewed: https://review.openstack.org/301859
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c469b8466fc5ff5514957a0fbd17d141761774c8
Submitter: Jenkins
Branch: master
commit c469b8466fc5ff5514957a0fbd17d141761774c8
Author: Nikola Dipanov <ndipanov@xxxxxxxxxx>
Date: Tue Apr 5 18:09:53 2016 +0100
pci: make sure device relationships are kept in memory
`pci_devs` attribute of PciDevTracker class is the in-memory
"master copy" of all
devices on each compute host, and all data changes that happen when
claiming/allocating/freeing
devices HAVE TO be made against instances contained in `pci_devs`
list, because they are periodically flushed to the DB when the save()
method is called.
Due to this we need to make sure all the relationships are available to
the code using them (claiming/allocation/freeing methods).
We do this by simply keeping a tree structure by referencing
parent/children from objects themselves. This is done on every update of
the state of PCI devices (on compute service start up, and on every
resource tracker pass), so that this information is always as up to date
as the in memory view of devices.
This change adds the code to build up the tree, and subsequent changes
will make sure the newly added relationships are used when needed. We
also add 2 non-versioned fields added to PciDevice object to hold the
references.
Co-Authored-By: Sahid Ferdjaoui <sahid.ferdjaoui@xxxxxxxxxx>
Change-Id: Id6868b7839efb2cd53f5f7aaac2c55d169356ce4
Partial-bug: #1565785
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1565785
Title:
SR-IOV PF passthrough device claiming/allocation does not work for
physical functions devices
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Enable PCI passthrough on a compute host (whitelist devices explained
in more detail in the docs), and create a network, subnet and a port
that represents a SR-IOV physical function passthrough:
$ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net
$ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne
$ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf
After that try to boot an instance using the created port (provided
the pci_passthrough_whitelist was setup correctly) this should work:
$ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm
My test env has 2 PFs with 7 VFs each, after spawning an instance, the
PF gets marked as allocated, but non of the VFs do, even though they
are removed from the host (note that device_pools are correctly
updated.
So after the instance was successfully booted we get
MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0;
+----------+
| count(*) |
+----------+
| 15 |
+----------+
# This should be 8 - we are leaking 7 VFs belonging to the attached PF
that never get updated.
MariaDB [nova]> select pci_stats from compute_nodes;
| pci_stats
| {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": [{"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_
node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_object.data": {"count": 1, "numa_node": 0, "vendor_id": "8086", "product_id": "1521", "tags": {"dev_type": "type-PF", "physical
_network": "phynet"}}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_
object.data": {"count": 7, "numa_node": 0, "vendor_id": "8086", "product_id": "1520", "tags": {"dev_type": "type-VF", "physical_network": "phynet"}}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "n
ova"} |
This is correct - shows 8 available devices
Once a new resource_tracker run happens we hit
https://bugs.launchpad.net/nova/+bug/1565721 so we stop updating based
on what is found on the host.
The root cause of this is (I believe) that we update PCI objects in
the local scope, but never call save() on those particular instances.
So we grap and update the status here:
https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/objects/pci_device.py#L339-L349
but never call save inside that method.
The save is eventually called here referencing completely different
instances that never see the update:
https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/compute/resource_tracker.py#L646
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1565785/+subscriptions
References