← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2039812] [NEW] [N-D-R] The dynamic routing service is not resilient to infrastructure outage

 

Public bug reported:

Nowadays, the n-d-r service architecture depends of some kind of
messaging between the DRAgent service and Neutron server side. However,
this communications is strongly depent of the messaging service
availability (RabbitMQ by default), and any transient/permanent failures
in openstack infrastructure nodes may affect prefix advertising via BGP.

The issue here is not related to communication dependent on the
messaging service itself, as this is the common design of OpenStack
modules. I'm talking about how the control plane service (n-d-r) can
actively affect the data plane.

I understand that the application design drop BGP peer connection after
a certain timeout without RMQ communication (in my tests it took 1 hour)
but as a result, all the prefixes/FIPs will stop to advertising
(dropping the external connectivity). To be clear, lack of messages
between the Neutron server and DRAgent via RMQ will cause a general
unavailability of the whole North/South data plane.

IMO: it would be helpfully for the DRAgent service to implement a
resilience solution for the data plane, keeping sessions with BGP peers
and waiting for te RMQ communication back (A large timeout can be help
here). Additionally, the n-d-r on the Neutron side needs to keep the bgp
speaker alive in infrastructure failure cases because if the speaker is
removed the DRAgent will no longer work.

I know that RMQ being out of servive for long periods is critical for
many parts of OpenStack, but even with HA in the DRAgents depoloyment,
we will have a single point of failure, as n-d-r agent needs to
communicate with Neutron via the messaging service.

Has anyone else had this problem? Does it make sense to you?


---------------------------------------------------------------------------------------------

logs of the ndr service being stopped and closing the BGP peers
connections:


Aug 22 04:08:24 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:24.099 913828 ERROR oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] AMQP server on 10.36.16.246:5671 is unreachable: . Trying again in 1 seconds.: TimeoutError
Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.231 913828 INFO oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] Reconnected to AMQP server on 10.36.16.246:5671 via [amqp] client with port 53326.
Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.271 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 27.11 sec

Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.275 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.279 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62


Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.314 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.317 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 78.05 sec

Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.324 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 33 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.325 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.942 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.944 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 112.63 sec


Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.952 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 43 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.955 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 480 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c


Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.438 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.441 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 243.50 sec


Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Failed reporting state!: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self._queues[msg_id].get(block=True, timeout=timeout)
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return waiter.wait()
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return get_hub().switch()
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self.greenlet.switch()
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent _queue.Empty
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent 
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent During handling of the above exception, another exception occurred:
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent 
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 697, in _report_state
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     agent_status = self.state_rpc.report_state(ctx, self.agent_state,
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/neutron/agent/rpc.py", line 104, in report_state
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return method(context, 'report_state', **kwargs)
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189, in call
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     result = self.transport._send(
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self._driver.send(target, ctxt, message,
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self._send(target, ctxt, message, wait_for_reply, timeout,
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 678, in _send
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     result = self._waiter.wait(msg_id, timeout,
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 567, in wait
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     message = self.waiters.get(msg_id, timeout=timeout)
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 443, in get
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     raise oslo_messaging.MessagingTimeout(
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
                                                                     2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent 
Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.158 913828 WARNING oslo.service.loopingcall [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec


Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.451 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.456 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 600 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514


Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.322 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.325 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 466.88 sec


Aug 22 04:32:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:32:47.175 913828 WARNING oslo.service.loopingcall [req-bc2c34db-c2bd-44f6-8298-2ac7efe0456a - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec


Aug 22 04:36:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:36:06.334 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.018 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.022 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 617.70 sec


Aug 22 04:42:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:42:47.188 913828 WARNING oslo.service.loopingcall [req-a14ad202-4c67-46d2-8f1c-d85e80766201 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.01 sec


Aug 22 04:47:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:04.033 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 32 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.491 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.495 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 592.47 sec

Aug 22 04:52:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
2023-08-22 04:52:47.202 913828 WARNING oslo.service.loopingcall
[req-7fea5ac1-e7f5-4276-804f-3991b22b286b - - - - -] Function
'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
run outlasted interval by 0.01 sec


Aug 22 04:57:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:57:36.505 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 30 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.207 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.211 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 589.71 sec


Aug 22 05:02:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
2023-08-22 05:02:47.218 913828 WARNING oslo.service.loopingcall
[req-40496068-0ca9-49a9-86c1-9b976b7a848c - - - - -] Function
'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
run outlasted interval by 0.01 sec


Aug 22 05:08:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:08:06.227 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 54 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.501 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.505 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 614.29 sec
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.547 913828 INFO bgpspeaker.api.base [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] API method core.stop called with args: {}
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.552 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1a lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.553 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1a for remote_as=64664 went DOWN.
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.speaker [-] Connection lost as protocol is no longer active
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1b lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1b for remote_as=64664 went DOWN.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2039812

Title:
  [N-D-R] The dynamic routing service is not resilient to infrastructure
  outage

Status in neutron:
  New

Bug description:
  Nowadays, the n-d-r service architecture depends of some kind of
  messaging between the DRAgent service and Neutron server side.
  However, this communications is strongly depent of the messaging
  service availability (RabbitMQ by default), and any
  transient/permanent failures in openstack infrastructure nodes may
  affect prefix advertising via BGP.

  The issue here is not related to communication dependent on the
  messaging service itself, as this is the common design of OpenStack
  modules. I'm talking about how the control plane service (n-d-r) can
  actively affect the data plane.

  I understand that the application design drop BGP peer connection
  after a certain timeout without RMQ communication (in my tests it took
  1 hour) but as a result, all the prefixes/FIPs will stop to
  advertising (dropping the external connectivity). To be clear, lack of
  messages between the Neutron server and DRAgent via RMQ will cause a
  general unavailability of the whole North/South data plane.

  IMO: it would be helpfully for the DRAgent service to implement a
  resilience solution for the data plane, keeping sessions with BGP
  peers and waiting for te RMQ communication back (A large timeout can
  be help here). Additionally, the n-d-r on the Neutron side needs to
  keep the bgp speaker alive in infrastructure failure cases because if
  the speaker is removed the DRAgent will no longer work.

  I know that RMQ being out of servive for long periods is critical for
  many parts of OpenStack, but even with HA in the DRAgents depoloyment,
  we will have a single point of failure, as n-d-r agent needs to
  communicate with Neutron via the messaging service.

  Has anyone else had this problem? Does it make sense to you?

  
  ---------------------------------------------------------------------------------------------

  logs of the ndr service being stopped and closing the BGP peers
  connections:

  
  Aug 22 04:08:24 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:24.099 913828 ERROR oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] AMQP server on 10.36.16.246:5671 is unreachable: . Trying again in 1 seconds.: TimeoutError
  Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.231 913828 INFO oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] Reconnected to AMQP server on 10.36.16.246:5671 via [amqp] client with port 53326.
  Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.271 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 27.11 sec

  Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.275 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
  Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.279 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62

  
  Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.314 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
  Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.317 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 78.05 sec

  Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.324 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 33 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
  Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.325 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
  Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.942 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
  Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.944 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 112.63 sec

  
  Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.952 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 43 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
  Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.955 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 480 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c

  
  Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.438 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
  Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.441 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 243.50 sec

  
  Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Failed reporting state!: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self._queues[msg_id].get(block=True, timeout=timeout)
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return waiter.wait()
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return get_hub().switch()
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self.greenlet.switch()
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent _queue.Empty
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent 
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent During handling of the above exception, another exception occurred:
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent 
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 697, in _report_state
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     agent_status = self.state_rpc.report_state(ctx, self.agent_state,
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/neutron/agent/rpc.py", line 104, in report_state
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return method(context, 'report_state', **kwargs)
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189, in call
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     result = self.transport._send(
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self._driver.send(target, ctxt, message,
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     return self._send(target, ctxt, message, wait_for_reply, timeout,
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 678, in _send
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     result = self._waiter.wait(msg_id, timeout,
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 567, in wait
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     message = self.waiters.get(msg_id, timeout=timeout)
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 443, in get
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent     raise oslo_messaging.MessagingTimeout(
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
                                                                       2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent 
  Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.158 913828 WARNING oslo.service.loopingcall [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec

  
  Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.451 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
  Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.456 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 600 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514


  Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.322 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
  Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.325 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 466.88 sec

  
  Aug 22 04:32:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:32:47.175 913828 WARNING oslo.service.loopingcall [req-bc2c34db-c2bd-44f6-8298-2ac7efe0456a - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec

  
  Aug 22 04:36:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:36:06.334 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
  Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.018 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
  Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.022 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 617.70 sec

  
  Aug 22 04:42:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:42:47.188 913828 WARNING oslo.service.loopingcall [req-a14ad202-4c67-46d2-8f1c-d85e80766201 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.01 sec

  
  Aug 22 04:47:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:04.033 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 32 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
  Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.491 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
  Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.495 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 592.47 sec

  Aug 22 04:52:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
  2023-08-22 04:52:47.202 913828 WARNING oslo.service.loopingcall
  [req-7fea5ac1-e7f5-4276-804f-3991b22b286b - - - - -] Function
  'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
  run outlasted interval by 0.01 sec


  Aug 22 04:57:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:57:36.505 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 30 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
  Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.207 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
  Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.211 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 589.71 sec


  Aug 22 05:02:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
  2023-08-22 05:02:47.218 913828 WARNING oslo.service.loopingcall
  [req-40496068-0ca9-49a9-86c1-9b976b7a848c - - - - -] Function
  'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
  run outlasted interval by 0.01 sec


  
  Aug 22 05:08:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:08:06.227 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 54 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.501 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.505 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 614.29 sec
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.547 913828 INFO bgpspeaker.api.base [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] API method core.stop called with args: {}
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.552 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1a lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.553 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1a for remote_as=64664 went DOWN.
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.speaker [-] Connection lost as protocol is no longer active
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1b lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
  Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1b for remote_as=64664 went DOWN.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2039812/+subscriptions