← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1938685] [NEW] ofctl timeouts lead to dvr-ha-multinode-full failures

 

Public bug reported:

Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job
start failing often. Usual test fail is:

"Details: (ServersTestJSON:setUpClass) Server
74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and
task state "None" within the required time (196 s). Current status:
BUILD. Current task state: spawning."

Looking at logs I see that the reason is ofctl timeout (300 sec) that
causes OVS agent to not process new port(s) in time:

Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 91, in _send_msg
...

Not sure why ofctl timeouts, but anyway default 300 seems too much

** Affects: neutron
     Importance: High
         Status: Triaged


** Tags: gate-failure

** Description changed:

  Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job
  start failing often. Usual test fail is:
  
  "Details: (ServersTestJSON:setUpClass) Server
  74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and
  task state "None" within the required time (196 s). Current status:
  BUILD. Current task state: spawning."
  
  Looking at logs I see that the reason is ofctl timeout (300 sec) that
  causes OVS agent to not process new port(s) in time:
  
  Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
  Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 91, in _send_msg
  ...
  
- Not sure why ofctl timeouts, but maybe default 300 is too much
+ Not sure why ofctl timeouts, but anyway default 300 seems too much

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1938685

Title:
  ofctl timeouts lead to dvr-ha-multinode-full failures

Status in neutron:
  Triaged

Bug description:
  Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job
  start failing often. Usual test fail is:

  "Details: (ServersTestJSON:setUpClass) Server
  74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and
  task state "None" within the required time (196 s). Current status:
  BUILD. Current task state: spawning."

  Looking at logs I see that the reason is ofctl timeout (300 sec) that
  causes OVS agent to not process new port(s) in time:

  Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
  Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
  Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 91, in _send_msg
  ...

  Not sure why ofctl timeouts, but anyway default 300 seems too much

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1938685/+subscriptions



Follow ups