yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #55554
[Bug 1616837] Re: neutron-sriov-agent crash when SRIOV and PCI-PT with same NIC as target.
** Project changed: nova => neutron
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1616837
Title:
neutron-sriov-agent crash when SRIOV and PCI-PT with same NIC as
target.
Status in neutron:
New
Bug description:
the SRIOV and PCIPT use case
1)hed1 and hed2 is created for both sriov and pci-pt
2)boot pci-pt vm on hed2
3)agent is down now and can not boot vm on hed1
User impact : any SR-IOV VF are no longer available for instances.
# Create instance that consume PCI NIC shared with SRIOV.
sriov-agent figures out NIC is gone and loop with following message and agent is still alive:
{code}2016-08-19 12:44:28.770 24473 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Agent out of sync with plugin!
2016-08-19 12:44:28.771 24473 DEBUG neutron.agent.linux.utils [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Running command (rootwrap daemon): ['ip', 'link', 'show', 'hed2'] execute_rootwrap_daemon /opt/
stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
2016-08-19 12:44:28.779 24473 ERROR neutron.agent.linux.utils [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "hed2" does not exist.
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Failed executing ip command
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib Traceback (most recent call last):
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent
/pci_lib.py", line 83, in get_assigned_macs
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib out = self._as_root([], "link", ("show", self.dev_name))
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 94, in
_as_root
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib log_fail_as_error=self.log_fail_as_error)
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 103, in
_execute
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib log_fail_as_error=log_fail_as_error)
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in
execute
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib raise RuntimeError(msg)
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "hed2" does not exist.
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib
2016-08-19 12:44:28.780 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.pci_lib
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Error in agent loop. Devices info: {}
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
ov/agent/sriov_nic_agent.py", line 377, in daemon_loop
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy)
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
ov/agent/sriov_nic_agent.py", line 196, in scan_devices
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info()
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
ov/agent/eswitch_manager.py", line 277, in get_assigned_devices_info
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info():
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
ov/agent/eswitch_manager.py", line 150, in get_assigned_devices_info
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent list(vf_to_pci_slot_mapping.keys()))
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sri
ov/agent/pci_lib.py", line 87, in get_assigned_macs
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent reason=e)
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent IpCommandDeviceError: ip command failed on device hed2: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "hed2" do
es not exist.
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2016-08-19 12:44:28.781 24473 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
2016-08-19 12:44:30.769 24473 DEBUG neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-ec987933-8dbe-4d0b-9df7-f9b9250968de - - - - -] Agent rpc_loop - iteration:74657 started daemon_loop /opt/st
ack/venv/neutron-20160811T121326Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py:364
{code}
# delete this instance
h5. neutron sriov-agent crashes and loop with the following error:
h5. {code}016-08-23 15:59:10.606 11506 INFO neutron.common.config [-] Logging enabled!
2016-08-23 15:59:10.607 11506 INFO neutron.common.config [-] /opt/stack/service/neutron-20160819T002002Z/venv/bin/neutron-sriov-nic-agent version 8.1.3.dev380
2016-08-23 15:59:10.608 11506 DEBUG neutron.common.config [-] command line: /opt/stack/service/neutron-20160819T002002Z/venv/bin/neutron-sriov-nic-agent --config-file=/opt/stack/service/neutron-20160819T002002Z/etc/ml2_conf_sriov_agent.ini --config-file=/opt/stack/service/neutron-20160819T002002Z/etc/neutron.conf --log-file=/var/log/neutron/neutron-sriov-nic-agent.log setup_logging /opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/common/config.py:269
2016-08-23 15:59:10.608 11506 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Physical Devices mappings: {'physnet3': ['hed2'], 'physnet2': ['hed1']}
2016-08-23 15:59:10.609 11506 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Exclude Devices: {}
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [-] Agent Initialization Failed
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last):
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 456, in main
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent polling_interval)
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 114, in __init__
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent exclude_devices)
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 193, in setup_eswitch_mgr
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent self.eswitch_mgr.discover_devices(device_mappings, exclude_devices)
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 345, in discover_devices
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent exclude_devices.get(dev_name, set()))
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 348, in _create_emb_switch
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent embedded_switch = EmbSwitch(phys_net, dev_name, exclude_devices)
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 120, in __init__
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent self._load_devices(exclude_devices)
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 127, in _load_devices
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent scanned_pci_list = PciOsWrapper.scan_vf_devices(self.dev_name)
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/opt/stack/venv/neutron-20160819T002002Z/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 65, in scan_vf_devices
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent reason=_("Device has no virtual functions"))
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent InvalidDeviceError: Invalid Device hed2: Device has no virtual functions
2016-08-23 15:59:10.610 11506 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
{code}
NIC list :
{code}78: hed2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 8c:dc:d4:ae:39:71 brd ff:ff:ff:ff:ff:ff
{code}
when another nic on this system shows VF's :
{code}3: hed1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 8c:dc:d4:ae:39:70 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
{code}
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1616837/+subscriptions
References