← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1565785] Fix merged to nova (master)

 

Reviewed:  https://review.openstack.org/301859
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c469b8466fc5ff5514957a0fbd17d141761774c8
Submitter: Jenkins
Branch:    master

commit c469b8466fc5ff5514957a0fbd17d141761774c8
Author: Nikola Dipanov <ndipanov@xxxxxxxxxx>
Date:   Tue Apr 5 18:09:53 2016 +0100

    pci: make sure device relationships are kept in memory
    
    `pci_devs` attribute of PciDevTracker class is the in-memory
    "master copy" of all
    devices on each compute host, and all data changes that happen when
    claiming/allocating/freeing
    devices HAVE TO be made against instances contained in `pci_devs`
    list, because they are periodically flushed to the DB when the save()
    method is called.
    
    Due to this we need to make sure all the relationships are available to
    the code using them (claiming/allocation/freeing methods).
    
    We do this by simply keeping a tree structure by referencing
    parent/children from objects themselves. This is done on every update of
    the state of PCI devices (on compute service start up, and on every
    resource tracker pass), so that this information is always as up to date
    as the in memory view of devices.
    
    This change adds the code to build up the tree, and subsequent changes
    will make sure the newly added relationships are used when needed. We
    also add 2 non-versioned fields added to PciDevice object to hold the
    references.
    
    Co-Authored-By: Sahid Ferdjaoui <sahid.ferdjaoui@xxxxxxxxxx>
    
    Change-Id: Id6868b7839efb2cd53f5f7aaac2c55d169356ce4
    Partial-bug: #1565785


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1565785

Title:
  SR-IOV PF passthrough device claiming/allocation does not work for
  physical functions devices

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Enable PCI passthrough on a compute host (whitelist devices explained
  in more detail in the docs), and create a network, subnet and a port
  that represents a SR-IOV physical function passthrough:

  $ neutron net-create --provider:physical_network=phynet --provider:network_type=flat sriov-net
  $ neutron subnet-create sriov-net 192.168.2.0/24 --name sriov-subne
  $ neutron port-create sriov-net --binding:vnic_type=direct-physical --name pf

  After that try to boot an instance using the created port (provided
  the pci_passthrough_whitelist was setup correctly) this should work:

  $ boot --image xxx --flavor 1 --nic port-id=$PORT_ABOVE testvm

  My test env has 2 PFs with 7 VFs each, after spawning an instance, the
  PF gets marked as allocated, but non of the VFs do, even though they
  are removed from the host (note that device_pools are correctly
  updated.

  So after the instance was successfully booted we get

  MariaDB [nova]> select count(*) from pci_devices where status="available" and deleted=0;
  +----------+
  | count(*) |
  +----------+
  |       15 |
  +----------+

  # This should be 8 - we are leaking 7 VFs belonging to the attached PF
  that never get updated.

  MariaDB [nova]> select pci_stats from compute_nodes;
  | pci_stats                                                                                                                                                                                                          
                                                                                                                                                                                                                       
  | {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": [{"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_
  node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_object.data": {"count": 1, "numa_node": 0, "vendor_id": "8086", "product_id": "1521", "tags": {"dev_type": "type-PF", "physical
  _network": "phynet"}}, "nova_object.namespace": "nova"}, {"nova_object.version": "1.1", "nova_object.changes": ["count", "numa_node", "vendor_id", "product_id", "tags"], "nova_object.name": "PciDevicePool", "nova_
  object.data": {"count": 7, "numa_node": 0, "vendor_id": "8086", "product_id": "1520", "tags": {"dev_type": "type-VF", "physical_network": "phynet"}}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "n
  ova"} |

  This is correct - shows 8 available devices

  Once a new resource_tracker run happens we hit
  https://bugs.launchpad.net/nova/+bug/1565721 so we stop updating based
  on what is found on the host.

  The root cause of this is (I believe) that we update PCI objects in
  the local scope, but never call save() on those particular instances.
  So we grap and update the status here:

  https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/objects/pci_device.py#L339-L349

  but never call save inside that method.

  The save is eventually called here referencing completely different
  instances that never see the update:

  https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/compute/resource_tracker.py#L646

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1565785/+subscriptions


References