yahoo-eng-team team mailing list archive
  
  - 
     yahoo-eng-team team yahoo-eng-team team
- 
    Mailing list archive
  
- 
    Message #54653
  
 [Bug 1605549] Re: PCI whitelist exception causes the resource tracker to stop and will not allow us to spawn further SR-IOV/PCIPT VMs when SR-IOV PF is assigned to a VM.
  
Reviewed:  https://review.openstack.org/345925
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=433fe514e8d166345b2bd8fa0b3055724285406d
Submitter: Jenkins
Branch:    master
commit 433fe514e8d166345b2bd8fa0b3055724285406d
Author: Manjunath Patil <mpatil@xxxxxx>
Date:   Fri Jul 22 14:31:11 2016 +0530
    Resolve PCI devices on the host during Guest boot-up.
    
    When devname is used in Whitelist configuration,
    resolve the address of devname when trying to
    match a device in the whiltelist.
    
    Change-Id: I7a65857454cc132d97df9abb8297d350514cf2df
    Closes-Bug: #1605549
    Co-Authored-By: Raghuveer Shenoy <rshenoy@xxxxxx>
    Co-Authored-By: Sonu <sonu.sudhakaran@xxxxxxxxx>
** Changed in: nova
       Status: In Progress => Fix Released
-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1605549
Title:
  PCI whitelist exception causes the resource tracker to stop and will
  not allow us to spawn further SR-IOV/PCIPT VMs when SR-IOV PF is
  assigned to a VM.
Status in OpenStack Compute (nova):
  Fix Released
Bug description:
  Encountered an exception in the pci whitelist causes the resource
  tracker to stop and blocks user/admin to spawn further VMs
  we have the following pci_whitelist to support both SRIOV and PCIPT on
  pci_passthrough_whitelist =
  [{"devname": "eth1", "physical_network": "physnet1"},
  {"physical_network": "physnet1", "address": "*:04:00.0"},
  {"physical_network": "physnet2", "address": "*:04:00.1"}]
  Once we boot the PCI passthrough VM on physnet1 using eth1,
  the device eth1 no longer available to hypervisor.
  So when we try to boot another PCI passthrough VM using eth2,
  the current code tries to validate the pci_whitelist and
  throws an error saying - device eth1 is not found.
  This is because pci_whitelist has devname eth1 and
  code tries to get the PCI address of the device which is not available.
  We also found that with the above mentioned pci_whitelist,
  as soon as we boot a PCI passthrough VM, the periodic resource
  tracker also stops. We further analysed and found that any
  misconfiguration of pci_whitelist could cause periodic
  resource tracker to stop.
  
  We get the following error in the nova compute log if eth1 is not present. But compute still shows up and the periodic hypervisor update stops working.
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details()
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device eth1 not found
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1605549/+subscriptions
References