← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1616837] Re: neutron-sriov-agent crash when SRIOV and PCI-PT with same NIC as target.

 

** Project changed: nova => neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1616837

Title:
  neutron-sriov-agent crash when SRIOV and PCI-PT with same NIC as
  target.

Status in neutron:
  New

Bug description:
  the SRIOV and PCIPT use case

  
  1)hed1 and hed2  is created for both sriov and pci-pt
  2)boot  pci-pt vm on hed2
  3)agent is down now and can not boot vm on hed1

  
  User impact : any SR-IOV VF  are no longer available for instances.

  # Create instance that consume PCI NIC shared with SRIOV.
   sriov-agent figures out NIC is gone and loop with following message and agent is still alive:
  {code}2016-08-19 12:44:28.770 24473 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Agent out of sync with plugin!
  2016-08-19 12:44:28.771 24473 DEBUG neutron.agent.linux.utils [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Running command (rootwrap daemon): ['ip', 'link', 'show', 'hed2'] execute_rootwrap_daemon /opt/
  stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
  2016-08-19 12:44:28.779 24473 ERROR neutron.agent.linux.utils [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "hed2" does not exist.

  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Failed executing ip command
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib Traceback (most recent call last):
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent
  /pci_lib.py", line 83, in get_assigned_macs
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib     out = self._as_root([], "link", ("show", self.dev_name))
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 94, in
  _as_root
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib     log_fail_as_error=self.log_fail_as_error)
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 103, in
   _execute
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib     log_fail_as_error=log_fail_as_error)
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in
  execute
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib     raise RuntimeError(msg)
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "hed2" does not exist.
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib
  2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Error in agent loop. Devices info: {}
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
  ov/agent/sriov_nic_agent.py", line 377, in daemon_loop
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     device_info = self.scan_devices(devices, updated_devices_copy)
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
  ov/agent/sriov_nic_agent.py", line 196, in scan_devices
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     curr_devices = self.eswitch_mgr.get_assigned_devices_info()
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
  ov/agent/eswitch_manager.py", line 277, in get_assigned_devices_info
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     for device in embedded_switch.get_assigned_devices_info():
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
  ov/agent/eswitch_manager.py", line 150, in get_assigned_devices_info
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     list(vf_to_pci_slot_mapping.keys()))
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
  ov/agent/pci_lib.py", line 87, in get_assigned_macs
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     reason=e)
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent IpCommandDeviceError: ip command failed on device hed2: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "hed2" do
  es not exist.
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
  2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
  2016-08-19 12:44:30.769 24473 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Agent rpc_loop - iteration:74657 started daemon_loop /opt/st
  ack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:364
  {code}
  # delete this instance

  h5. neutron sriov-agent crashes and loop with the following error:
  h5. {code}016-08-23 15:59:10.606 11506 INFO neutron.common.config [-] Logging enabled!
  2016-08-23 15:59:10.607 11506 INFO neutron.common.config [-] /opt/stack/service/neutron-20160819T002002Z/venv/bin/neutron-sriov-nic-agent version 8.1.3.dev380
  2016-08-23 15:59:10.608 11506 DEBUG neutron.common.config [-] command line: /opt/stack/service/neutron-20160819T002002Z/venv/bin/neutron-sriov-nic-agent --config-file=/opt/stack/service/neutron-20160819T002002Z/etc/ml2_conf_sriov_agent.ini --config-file=/opt/stack/service/neutron-20160819T002002Z/etc/neutron.conf --log-file=/var/log/neutron/neutron-sriov-nic-agent.log setup_logging /opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/common/config.py:269
  2016-08-23 15:59:10.608 11506 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Physical Devices mappings: {'physnet3': ['hed2'], 'physnet2': ['hed1']}
  2016-08-23 15:59:10.609 11506 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Exclude Devices: {}
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Agent Initialization Failed
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 456, in main
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     polling_interval)
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 114, in __init__
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     exclude_devices)
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 193, in setup_eswitch_mgr
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     self.eswitch_mgr.discover_devices(device_mappings, exclude_devices)
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 345, in discover_devices
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     exclude_devices.get(dev_name, set()))
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 348, in _create_emb_switch
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     embedded_switch = EmbSwitch(phys_net, dev_name, exclude_devices)
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 120, in __init__
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     self._load_devices(exclude_devices)
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 127, in _load_devices
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     scanned_pci_list = PciOsWrapper.scan_vf_devices(self.dev_name)
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 65, in scan_vf_devices
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     reason=_("Device has no virtual functions"))
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent InvalidDeviceError: Invalid Device hed2: Device has no virtual functions
  2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
  {code}

  NIC list :
  {code}78: hed2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
      link/ether 8c:dc:d4:ae:39:71 brd ff:ff:ff:ff:ff:ff
  {code}
  when another nic on this system shows VF's :
  {code}3: hed1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
      link/ether 8c:dc:d4:ae:39:70 brd ff:ff:ff:ff:ff:ff
      vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
      vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
      vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
      vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
      vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
      vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
  {code}

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1616837/+subscriptions


References