← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1885758] [NEW] RPCMessage timeouts when ovs agent is reporting status about many ports

 

Public bug reported:

In case when neutron-ovs-agent is handling many ports, like e.g. 1000 and configures all of them in one rpc loop iteration, it will send update_devices_status rpc message to the neutron-server and it may timeout as neutron server will process ports one by one.
Such timeout will force to do full sync on agent's side so the problem will repeat.

Error on agent side is like:

2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Error while processing VIF ports: MessagingTimeout: Timed out waiting for a reply to message ID 894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2339, in rpc_loop
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     port_info, provisioning_needed)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1851, in process_network_ports
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.treat_devices_skipped(skipped_devices)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1783, in treat_devices_skipped
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.conf.host)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 171, in update_device_list
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     agent_restarted=agent_restarted)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in call
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     time.sleep(wait)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.force_reraise()
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     six.reraise(self.type_, self.value, self.tb)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     return self._original_context.call(ctxt, method, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=self.retry)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     timeout=timeout, retry=retry)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 625, in send
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=retry)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 614, in _send
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = self._waiter.wait(msg_id, timeout)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 506, in wait
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     message = self.waiters.get(msg_id, timeout=timeout)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 394, in get
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     'to message ID %s' % msg_id)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2020-06-30 16:13:49.176 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'updated': 0, 'added': 1250, 'removed': 0}}. Elapsed:2169.184
2020-06-30 16:13:49.177 148737 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Loop iteration exceeded interval (2 vs. 2169.18406701)! loop_count_and_wait /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutr
on_agent.py:2063
2020-06-30 16:13:49.178 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:1 started
2020-06-30 16:13:49.182 148737 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] ofctl request version=0x4,msg_type=0x12,msg_len=0x38,xid=0x27b6f57f,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_grou
p=4294967295,out_port=4294967295,table_id=23,type=1) result [OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=12410424124845347866L,duration_nsec=444000000,duration_sec=2186,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],fl
ags=0,type=1)] _send_msg /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:98
2020-06-30 16:13:49.191 148737 DEBUG neutron.agent.linux.utils [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Found cmdline ['ovsdb-client', 'monitor', 'tcp:127.0.0.1:6640', 'Interface', 'name,ofport,external_ids', '--format=json'] for rocess with PID 149081. get_cmdline_from_pid /usr/lib/python2.7/site-package
s/neutron/agent/linux/utils.py:329
2020-06-30 16:13:49.224 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:1 - starting polling. Elapsed:0.046
2020-06-30 16:13:49.226 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent out of sync with plugin!


We should probably do here something similar what is done in e.g. DHCP agent and we should report statuses about ports in batches.

** Affects: neutron
     Importance: Medium
     Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
         Status: Confirmed


** Tags: ovs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1885758

Title:
  RPCMessage timeouts when ovs agent is reporting status about many
  ports

Status in neutron:
  Confirmed

Bug description:
  In case when neutron-ovs-agent is handling many ports, like e.g. 1000 and configures all of them in one rpc loop iteration, it will send update_devices_status rpc message to the neutron-server and it may timeout as neutron server will process ports one by one.
  Such timeout will force to do full sync on agent's side so the problem will repeat.

  Error on agent side is like:

  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Error while processing VIF ports: MessagingTimeout: Timed out waiting for a reply to message ID 894728e7bda24382931b5af26a367f16
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2339, in rpc_loop
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     port_info, provisioning_needed)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = f(*args, **kwargs)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1851, in process_network_ports
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.treat_devices_skipped(skipped_devices)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = f(*args, **kwargs)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1783, in treat_devices_skipped
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.conf.host)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 171, in update_device_list
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     agent_restarted=agent_restarted)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in call
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     time.sleep(wait)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     self.force_reraise()
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     six.reraise(self.type_, self.value, self.tb)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     return self._original_context.call(ctxt, method, **kwargs)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=self.retry)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     timeout=timeout, retry=retry)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 625, in send
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     retry=retry)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 614, in _send
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     result = self._waiter.wait(msg_id, timeout)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 506, in wait
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     message = self.waiters.get(msg_id, timeout=timeout)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 394, in get
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent     'to message ID %s' % msg_id)
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 894728e7bda24382931b5af26a367f16
  2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  2020-06-30 16:13:49.176 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'updated': 0, 'added': 1250, 'removed': 0}}. Elapsed:2169.184
  2020-06-30 16:13:49.177 148737 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Loop iteration exceeded interval (2 vs. 2169.18406701)! loop_count_and_wait /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutr
  on_agent.py:2063
  2020-06-30 16:13:49.178 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:1 started
  2020-06-30 16:13:49.182 148737 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] ofctl request version=0x4,msg_type=0x12,msg_len=0x38,xid=0x27b6f57f,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_grou
  p=4294967295,out_port=4294967295,table_id=23,type=1) result [OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=12410424124845347866L,duration_nsec=444000000,duration_sec=2186,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],fl
  ags=0,type=1)] _send_msg /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:98
  2020-06-30 16:13:49.191 148737 DEBUG neutron.agent.linux.utils [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Found cmdline ['ovsdb-client', 'monitor', 'tcp:127.0.0.1:6640', 'Interface', 'name,ofport,external_ids', '--format=json'] for rocess with PID 149081. get_cmdline_from_pid /usr/lib/python2.7/site-package
  s/neutron/agent/linux/utils.py:329
  2020-06-30 16:13:49.224 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:1 - starting polling. Elapsed:0.046
  2020-06-30 16:13:49.226 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent out of sync with plugin!

  
  We should probably do here something similar what is done in e.g. DHCP agent and we should report statuses about ports in batches.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1885758/+subscriptions