← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1605549] [NEW] PCI whitelist exception causes the resource tracker to stop and will not allow us to spawn further SR-IOV/PCIPT VMs when SR-IOV PF is assigned to a VM.

 

Public bug reported:

Encountered an exception in the pci whitelist causes the resource
tracker to stop and blocks user/admin to spawn further VMs

we have the following pci_whitelist to support both SRIOV and PCIPT on
pci_passthrough_whitelist =
[{"devname": "eth1", "physical_network": "physnet1"},
{"physical_network": "physnet1", "address": "*:04:00.0"},
{"physical_network": "physnet2", "address": "*:04:00.1"}]

Once we boot the PCI passthrough VM on physnet1 using eth1,
the device eth1 no longer available to hypervisor.
So when we try to boot another PCI passthrough VM using eth2,
the current code tries to validate the pci_whitelist and
throws an error saying - device eth1 is not found.
This is because pci_whitelist has devname eth1 and
code tries to get the PCI address of the device which is not available.
We also found that with the above mentioned pci_whitelist,
as soon as we boot a PCI passthrough VM, the periodic resource
tracker also stops. We further analysed and found that any
misconfiguration of pci_whitelist could cause periodic
resource tracker to stop.


We get the following error in the nova compute log if eth1 is not present. But compute still shows up and the periodic hypervisor update stops working.

2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details()
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device eth1 not found
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager

** Affects: nova
     Importance: Undecided
     Assignee: MANJUNATH PATIL (mpatil)
         Status: New


** Tags: compute pci

** Changed in: nova
     Assignee: (unassigned) => MANJUNATH PATIL (mpatil)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1605549

Title:
  PCI whitelist exception causes the resource tracker to stop and will
  not allow us to spawn further SR-IOV/PCIPT VMs when SR-IOV PF is
  assigned to a VM.

Status in OpenStack Compute (nova):
  New

Bug description:
  Encountered an exception in the pci whitelist causes the resource
  tracker to stop and blocks user/admin to spawn further VMs

  we have the following pci_whitelist to support both SRIOV and PCIPT on
  pci_passthrough_whitelist =
  [{"devname": "eth1", "physical_network": "physnet1"},
  {"physical_network": "physnet1", "address": "*:04:00.0"},
  {"physical_network": "physnet2", "address": "*:04:00.1"}]

  Once we boot the PCI passthrough VM on physnet1 using eth1,
  the device eth1 no longer available to hypervisor.
  So when we try to boot another PCI passthrough VM using eth2,
  the current code tries to validate the pci_whitelist and
  throws an error saying - device eth1 is not found.
  This is because pci_whitelist has devname eth1 and
  code tries to get the PCI address of the device which is not available.
  We also found that with the above mentioned pci_whitelist,
  as soon as we boot a PCI passthrough VM, the periodic resource
  tracker also stops. We further analysed and found that any
  misconfiguration of pci_whitelist could cause periodic
  resource tracker to stop.

  
  We get the following error in the nova compute log if eth1 is not present. But compute still shows up and the periodic hypervisor update stops working.

  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details()
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device eth1 not found
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1605549/+subscriptions


Follow ups