yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #54611
[Bug 1603034] Re: pci whitelist exception will kill the periodic update of the hypervisor statistics
Reviewed: https://review.openstack.org/342301
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a61ae35d4b713f423219c7b714126e1584694e8
Submitter: Jenkins
Branch: master
commit 3a61ae35d4b713f423219c7b714126e1584694e8
Author: Matt Riedemann <mriedem@xxxxxxxxxx>
Date: Thu Jul 14 13:37:05 2016 -0400
Validate pci_passthrough_whitelist when starting n-cpu
Loading up CONF.pci_passthrough_whitelist in the Whitelist
object performs a bunch of validation and can fail in several
different ways (invalid json, invalid values, invalid combinations
of keys, devices not found, etc). This happens today when
creating the PciDevTracker in the ResourceTracker when updating
available resources. If the configuration is bad, it kills the
periodic task to update available resources on the compute node.
We should just load up the pci_passthrough_whitelist (if set)
when starting the nova-compute service so we can fail fast and
kill the service on any misconfiguration rather than run with
a broken service.
Change-Id: If50fb837b490042bb5ef20e9ad843b28f871a44e
Closes-Bug: #1603034
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1603034
Title:
pci whitelist exception will kill the periodic update of the
hypervisor statistics
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) mitaka series:
Confirmed
Bug description:
An encountered exception in the pci whitelist will cause the periodic
hypervisor update loop to terminate and not be tried again. Retries
should continue at the normal interval.
Scenario 1:
Update the nova.conf with the pci_whitelist as follows:
pci_passthrough_whitelist = [ {"devname": "hed1", "physical_network": "physnet1"},{"physical_network": "physnet1", "address": "*:04:00.0"},{"physical_network": "physnet2", "address": "*:04:00.1"}]
We get the following error in the nova compute log if hed1 is not
present. But compute still shows up and the periodic hypervisor update
stops working.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details()
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name)
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device hed1 not found
2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1603034/+subscriptions
References