yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #54167
[Bug 1605549] [NEW] PCI whitelist exception causes the resource tracker to stop and will not allow us to spawn further SR-IOV/PCIPT VMs when SR-IOV PF is assigned to a VM.
Public bug reported:
Encountered an exception in the pci whitelist causes the resource
tracker to stop and blocks user/admin to spawn further VMs
we have the following pci_whitelist to support both SRIOV and PCIPT on
pci_passthrough_whitelist =
[{"devname": "eth1", "physical_network": "physnet1"},
{"physical_network": "physnet1", "address": "*:04:00.0"},
{"physical_network": "physnet2", "address": "*:04:00.1"}]
Once we boot the PCI passthrough VM on physnet1 using eth1,
the device eth1 no longer available to hypervisor.
So when we try to boot another PCI passthrough VM using eth2,
the current code tries to validate the pci_whitelist and
throws an error saying - device eth1 is not found.
This is because pci_whitelist has devname eth1 and
code tries to get the PCI address of the device which is not available.
We also found that with the above mentioned pci_whitelist,
as soon as we boot a PCI passthrough VM, the periodic resource
tracker also stops. We further analysed and found that any
misconfiguration of pci_whitelist could cause periodic
resource tracker to stop.
We get the following error in the nova compute log if eth1 is not present. But compute still shows up and the periodic hypervisor update stops working.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details()
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device eth1 not found
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager
** Affects: nova
Importance: Undecided
Assignee: MANJUNATH PATIL (mpatil)
Status: New
** Tags: compute pci
** Changed in: nova
Assignee: (unassigned) => MANJUNATH PATIL (mpatil)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1605549
Title:
PCI whitelist exception causes the resource tracker to stop and will
not allow us to spawn further SR-IOV/PCIPT VMs when SR-IOV PF is
assigned to a VM.
Status in OpenStack Compute (nova):
New
Bug description:
Encountered an exception in the pci whitelist causes the resource
tracker to stop and blocks user/admin to spawn further VMs
we have the following pci_whitelist to support both SRIOV and PCIPT on
pci_passthrough_whitelist =
[{"devname": "eth1", "physical_network": "physnet1"},
{"physical_network": "physnet1", "address": "*:04:00.0"},
{"physical_network": "physnet2", "address": "*:04:00.1"}]
Once we boot the PCI passthrough VM on physnet1 using eth1,
the device eth1 no longer available to hypervisor.
So when we try to boot another PCI passthrough VM using eth2,
the current code tries to validate the pci_whitelist and
throws an error saying - device eth1 is not found.
This is because pci_whitelist has devname eth1 and
code tries to get the PCI address of the device which is not available.
We also found that with the above mentioned pci_whitelist,
as soon as we boot a PCI passthrough VM, the periodic resource
tracker also stops. We further analysed and found that any
misconfiguration of pci_whitelist could cause periodic
resource tracker to stop.
We get the following error in the nova compute log if eth1 is not present. But compute still shows up and the periodic hypervisor update stops working.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details()
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device eth1 not found
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1605549/+subscriptions
Follow ups