← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1603034] Re: pci whitelist exception will kill the periodic update of the hypervisor statistics

 

Reviewed:  https://review.openstack.org/342301
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a61ae35d4b713f423219c7b714126e1584694e8
Submitter: Jenkins
Branch:    master

commit 3a61ae35d4b713f423219c7b714126e1584694e8
Author: Matt Riedemann <mriedem@xxxxxxxxxx>
Date:   Thu Jul 14 13:37:05 2016 -0400

    Validate pci_passthrough_whitelist when starting n-cpu
    
    Loading up CONF.pci_passthrough_whitelist in the Whitelist
    object performs a bunch of validation and can fail in several
    different ways (invalid json, invalid values, invalid combinations
    of keys, devices not found, etc). This happens today when
    creating the PciDevTracker in the ResourceTracker when updating
    available resources. If the configuration is bad, it kills the
    periodic task to update available resources on the compute node.
    
    We should just load up the pci_passthrough_whitelist (if set)
    when starting the nova-compute service so we can fail fast and
    kill the service on any misconfiguration rather than run with
    a broken service.
    
    Change-Id: If50fb837b490042bb5ef20e9ad843b28f871a44e
    Closes-Bug: #1603034


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1603034

Title:
  pci whitelist exception will kill the periodic update of the
  hypervisor statistics

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) mitaka series:
  Confirmed

Bug description:
  An encountered exception in the pci whitelist will cause the periodic
  hypervisor update loop to terminate and not be tried again. Retries
  should continue at the normal interval.

  Scenario 1:

  Update the nova.conf with the pci_whitelist as follows:
  pci_passthrough_whitelist = [ {"devname": "hed1", "physical_network": "physnet1"},{"physical_network": "physnet1", "address": "*:04:00.0"},{"physical_network": "physnet2", "address": "*:04:00.1"}]

  We get the following error in the nova compute log if hed1 is not
  present. But compute still shows up and the periodic hypervisor update
  stops working.

  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt.
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last):
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     rt.update_available_resource(context)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     self._update_available_resource(context, resources)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     return f(*args, **kwargs)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     node_id=n_id)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     self.specs = self._parse_white_list_from_config(whitelist_spec)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     spec = devspec.PciDeviceSpec(ds)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     self._init_dev_details()
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager   File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager     raise exception.PciDeviceNotFoundById(id=self.dev_name)
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device hed1 not found
  2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1603034/+subscriptions


References