yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92970
[Bug 2039812] [NEW] [N-D-R] The dynamic routing service is not resilient to infrastructure outage
Public bug reported:
Nowadays, the n-d-r service architecture depends of some kind of
messaging between the DRAgent service and Neutron server side. However,
this communications is strongly depent of the messaging service
availability (RabbitMQ by default), and any transient/permanent failures
in openstack infrastructure nodes may affect prefix advertising via BGP.
The issue here is not related to communication dependent on the
messaging service itself, as this is the common design of OpenStack
modules. I'm talking about how the control plane service (n-d-r) can
actively affect the data plane.
I understand that the application design drop BGP peer connection after
a certain timeout without RMQ communication (in my tests it took 1 hour)
but as a result, all the prefixes/FIPs will stop to advertising
(dropping the external connectivity). To be clear, lack of messages
between the Neutron server and DRAgent via RMQ will cause a general
unavailability of the whole North/South data plane.
IMO: it would be helpfully for the DRAgent service to implement a
resilience solution for the data plane, keeping sessions with BGP peers
and waiting for te RMQ communication back (A large timeout can be help
here). Additionally, the n-d-r on the Neutron side needs to keep the bgp
speaker alive in infrastructure failure cases because if the speaker is
removed the DRAgent will no longer work.
I know that RMQ being out of servive for long periods is critical for
many parts of OpenStack, but even with HA in the DRAgents depoloyment,
we will have a single point of failure, as n-d-r agent needs to
communicate with Neutron via the messaging service.
Has anyone else had this problem? Does it make sense to you?
---------------------------------------------------------------------------------------------
logs of the ndr service being stopped and closing the BGP peers
connections:
Aug 22 04:08:24 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:24.099 913828 ERROR oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] AMQP server on 10.36.16.246:5671 is unreachable: . Trying again in 1 seconds.: TimeoutError
Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.231 913828 INFO oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] Reconnected to AMQP server on 10.36.16.246:5671 via [amqp] client with port 53326.
Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.271 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 27.11 sec
Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.275 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.279 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.314 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.317 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 78.05 sec
Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.324 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 33 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.325 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.942 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.944 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 112.63 sec
Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.952 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 43 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.955 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 480 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.438 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.441 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 243.50 sec
Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Failed reporting state!: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self._queues[msg_id].get(block=True, timeout=timeout)
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return waiter.wait()
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return get_hub().switch()
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self.greenlet.switch()
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent _queue.Empty
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent During handling of the above exception, another exception occurred:
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 697, in _report_state
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent agent_status = self.state_rpc.report_state(ctx, self.agent_state,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/neutron/agent/rpc.py", line 104, in report_state
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return method(context, 'report_state', **kwargs)
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189, in call
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent result = self.transport._send(
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self._driver.send(target, ctxt, message,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self._send(target, ctxt, message, wait_for_reply, timeout,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 678, in _send
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent result = self._waiter.wait(msg_id, timeout,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 567, in wait
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent message = self.waiters.get(msg_id, timeout=timeout)
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 443, in get
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent raise oslo_messaging.MessagingTimeout(
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent
Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.158 913828 WARNING oslo.service.loopingcall [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec
Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.451 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.456 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 600 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.322 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.325 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 466.88 sec
Aug 22 04:32:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:32:47.175 913828 WARNING oslo.service.loopingcall [req-bc2c34db-c2bd-44f6-8298-2ac7efe0456a - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec
Aug 22 04:36:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:36:06.334 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.018 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.022 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 617.70 sec
Aug 22 04:42:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:42:47.188 913828 WARNING oslo.service.loopingcall [req-a14ad202-4c67-46d2-8f1c-d85e80766201 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.01 sec
Aug 22 04:47:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:04.033 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 32 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.491 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.495 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 592.47 sec
Aug 22 04:52:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
2023-08-22 04:52:47.202 913828 WARNING oslo.service.loopingcall
[req-7fea5ac1-e7f5-4276-804f-3991b22b286b - - - - -] Function
'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
run outlasted interval by 0.01 sec
Aug 22 04:57:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:57:36.505 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 30 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.207 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.211 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 589.71 sec
Aug 22 05:02:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
2023-08-22 05:02:47.218 913828 WARNING oslo.service.loopingcall
[req-40496068-0ca9-49a9-86c1-9b976b7a848c - - - - -] Function
'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
run outlasted interval by 0.01 sec
Aug 22 05:08:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:08:06.227 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 54 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.501 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.505 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 614.29 sec
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.547 913828 INFO bgpspeaker.api.base [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] API method core.stop called with args: {}
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.552 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1a lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.553 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1a for remote_as=64664 went DOWN.
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.speaker [-] Connection lost as protocol is no longer active
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1b lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1b for remote_as=64664 went DOWN.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2039812
Title:
[N-D-R] The dynamic routing service is not resilient to infrastructure
outage
Status in neutron:
New
Bug description:
Nowadays, the n-d-r service architecture depends of some kind of
messaging between the DRAgent service and Neutron server side.
However, this communications is strongly depent of the messaging
service availability (RabbitMQ by default), and any
transient/permanent failures in openstack infrastructure nodes may
affect prefix advertising via BGP.
The issue here is not related to communication dependent on the
messaging service itself, as this is the common design of OpenStack
modules. I'm talking about how the control plane service (n-d-r) can
actively affect the data plane.
I understand that the application design drop BGP peer connection
after a certain timeout without RMQ communication (in my tests it took
1 hour) but as a result, all the prefixes/FIPs will stop to
advertising (dropping the external connectivity). To be clear, lack of
messages between the Neutron server and DRAgent via RMQ will cause a
general unavailability of the whole North/South data plane.
IMO: it would be helpfully for the DRAgent service to implement a
resilience solution for the data plane, keeping sessions with BGP
peers and waiting for te RMQ communication back (A large timeout can
be help here). Additionally, the n-d-r on the Neutron side needs to
keep the bgp speaker alive in infrastructure failure cases because if
the speaker is removed the DRAgent will no longer work.
I know that RMQ being out of servive for long periods is critical for
many parts of OpenStack, but even with HA in the DRAgents depoloyment,
we will have a single point of failure, as n-d-r agent needs to
communicate with Neutron via the messaging service.
Has anyone else had this problem? Does it make sense to you?
---------------------------------------------------------------------------------------------
logs of the ndr service being stopped and closing the BGP peers
connections:
Aug 22 04:08:24 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:24.099 913828 ERROR oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] AMQP server on 10.36.16.246:5671 is unreachable: . Trying again in 1 seconds.: TimeoutError
Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.231 913828 INFO oslo.messaging._drivers.impl_rabbit [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] [354770b6-dc97-439a-b059-91eb3be6b2f4] Reconnected to AMQP server on 10.36.16.246:5671 via [amqp] client with port 53326.
Aug 22 04:08:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:08:25.271 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 27.11 sec
Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.275 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:09:25 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:09:25.279 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 120 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.314 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID a2732db52e964c87aebf283a0d8a9f62
Aug 22 04:10:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:10:23.317 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 78.05 sec
Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.324 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 33 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:23 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:23.325 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 240 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.942 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e4938c710cd9469eb10518be008658ad
Aug 22 04:12:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:12:55.944 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 112.63 sec
Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.952 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 43 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:16:55 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:16:55.955 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 480 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.438 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 5c9b8b987c7a4c39a010f951fdb4d76c
Aug 22 04:17:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:17:39.441 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 243.50 sec
Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Failed reporting state!: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 441, in get
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self._queues[msg_id].get(block=True, timeout=timeout)
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return waiter.wait()
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return get_hub().switch()
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self.greenlet.switch()
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent _queue.Empty
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent During handling of the above exception, another exception occurred:
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent Traceback (most recent call last):
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 697, in _report_state
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent agent_status = self.state_rpc.report_state(ctx, self.agent_state,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/neutron/agent/rpc.py", line 104, in report_state
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return method(context, 'report_state', **kwargs)
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 189, in call
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent result = self.transport._send(
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self._driver.send(target, ctxt, message,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 689, in send
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent return self._send(target, ctxt, message, wait_for_reply, timeout,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 678, in _send
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent result = self._waiter.wait(msg_id, timeout,
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 567, in wait
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent message = self.waiters.get(msg_id, timeout=timeout)
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 443, in get
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent raise oslo_messaging.MessagingTimeout(
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 03ed494d17d84502871b667005e9c2d5
2023-08-22 04:22:47.146 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent
Aug 22 04:22:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:22:47.158 913828 WARNING oslo.service.loopingcall [req-dbc95626-cc0a-4b2e-b236-899237e093f1 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec
Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.451 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:25:39 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:25:39.456 913828 WARNING neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Increasing timeout for get_bgp_speakers calls to 600 seconds. Restart the agent to restore it to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.322 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID e3c6a36356dd47afb08fca31760b3514
Aug 22 04:26:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:26:06.325 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 466.88 sec
Aug 22 04:32:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:32:47.175 913828 WARNING oslo.service.loopingcall [req-bc2c34db-c2bd-44f6-8298-2ac7efe0456a - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.02 sec
Aug 22 04:36:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:36:06.334 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 58 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.018 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d225072f0be42caa1425f7728e5d612
Aug 22 04:37:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:37:04.022 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 617.70 sec
Aug 22 04:42:47 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:42:47.188 913828 WARNING oslo.service.loopingcall [req-a14ad202-4c67-46d2-8f1c-d85e80766201 - - - - -] Function 'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state' run outlasted interval by 0.01 sec
Aug 22 04:47:04 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:04.033 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 32 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.491 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID d33e3beb754742a39756e440694c52d0
Aug 22 04:47:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:47:36.495 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 592.47 sec
Aug 22 04:52:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
2023-08-22 04:52:47.202 913828 WARNING oslo.service.loopingcall
[req-7fea5ac1-e7f5-4276-804f-3991b22b286b - - - - -] Function
'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
run outlasted interval by 0.01 sec
Aug 22 04:57:36 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:57:36.505 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 30 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.207 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 6a1de38f26074c5e95c959c2c52ce71e
Aug 22 04:58:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 04:58:06.211 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 589.71 sec
Aug 22 05:02:47 dragent-prod-1001 neutron-bgp-dragent[913828]:
2023-08-22 05:02:47.218 913828 WARNING oslo.service.loopingcall
[req-40496068-0ca9-49a9-86c1-9b976b7a848c - - - - -] Function
'neutron_dynamic_routing.services.bgp.agent.bgp_dragent.BgpDrAgentWithStateReport._report_state'
run outlasted interval by 0.01 sec
Aug 22 05:08:06 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:08:06.227 913828 ERROR neutron_lib.rpc [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Timeout in RPC method get_bgp_speakers. Waiting for 54 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.501 913828 ERROR neutron_dynamic_routing.services.bgp.agent.bgp_dragent [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Unable to sync BGP speaker state.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0be3189224cc4d55bbf97a9afb9efd3d
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.505 913828 WARNING oslo.service.loopingcall [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by 614.29 sec
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.547 913828 INFO bgpspeaker.api.base [req-e307c164-023d-4a74-94cb-40ccea100eb5 - - - - -] API method core.stop called with args: {}
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.552 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1a lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.553 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1a for remote_as=64664 went DOWN.
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.speaker [-] Connection lost as protocol is no longer active
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO bgpspeaker.peer [-] Connection to peer fc00:ca5a:ca5a:1004::1b lost, reason: Connection lost as protocol is no longer active Resetting retry connect loop: False
Aug 22 05:09:00 dragent-prod-1001 neutron-bgp-dragent[913828]: 2023-08-22 05:09:00.557 913828 INFO neutron_dynamic_routing.services.bgp.agent.driver.os_ken.driver [-] BGP Peer fc00:ca5a:ca5a:1004::1b for remote_as=64664 went DOWN.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2039812/+subscriptions