yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87168
[Bug 1943725] [NEW] Automatic cleanup of BGP speakers is too aggressive
Public bug reported:
We have seen regular issues with the neutron-bgp-dragent service when
one or more network nodes fail or are undergoing maintenance.
In the most problematic case, we have a deployment with four network
nodes. Each of these runs a neutron-bgp-dragent process, and each is
associated with the same BGP speaker. When one of these network nodes
goes down, a short time later a cleanup process runs, but rather than
just removing the speaker association from the absent network node, they
are removed from all but one of them.
During this process, the running neutron-bgp-dragent processes report
errors such as the following (observed using the latest neutron-dynamic-
routing code from stable/victoria):
"Unable to sync BGP speaker state.: RuntimeError: dictionary changed
size during iteration"
or
Sep 15 13:36:26 neutron-bgp-dragent[1308396]: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server [req-094dad10-b4da-4c50-8e32-f7814d446705 - - - - -] Exception during message handling: TypeError: unhashable type: 'dict'
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 344, in add_bgp_speaker_on_dragent
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.cache.put_bgp_speaker(bgp_speaker)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 582, in put_bgp_speaker
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.remove_bgp_speaker_by_id(self.cache[bgp_speaker['id']])
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 600, in remove_bgp_speaker_by_id
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server if bgp_speaker_id in self.cache:
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server TypeError: unhashable type: 'dict'
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server
This issue appears to match a comment here: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675/3#message-c409a4fb83a44216e03a041921c7067f44eb70d0
We will test out https://review.opendev.org/c/openstack/neutron-dynamic-
routing/+/780675 as in our case the automatic behaviour appears mostly
unnecessary, but a fix for the underlying issue would still be
appreciated.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1943725
Title:
Automatic cleanup of BGP speakers is too aggressive
Status in neutron:
New
Bug description:
We have seen regular issues with the neutron-bgp-dragent service when
one or more network nodes fail or are undergoing maintenance.
In the most problematic case, we have a deployment with four network
nodes. Each of these runs a neutron-bgp-dragent process, and each is
associated with the same BGP speaker. When one of these network nodes
goes down, a short time later a cleanup process runs, but rather than
just removing the speaker association from the absent network node,
they are removed from all but one of them.
During this process, the running neutron-bgp-dragent processes report
errors such as the following (observed using the latest neutron-
dynamic-routing code from stable/victoria):
"Unable to sync BGP speaker state.: RuntimeError: dictionary changed
size during iteration"
or
Sep 15 13:36:26 neutron-bgp-dragent[1308396]: 2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server [req-094dad10-b4da-4c50-8e32-f7814d446705 - - - - -] Exception during message handling: TypeError: unhashable type: 'dict'
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/osprofiler/profiler.py", line 160, in wrapper
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 344, in add_bgp_speaker_on_dragent
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.cache.put_bgp_speaker(bgp_speaker)
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 582, in put_bgp_speaker
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server self.remove_bgp_speaker_by_id(self.cache[bgp_speaker['id']])
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server File "/openstack/venvs/neutron-22.1.3/lib/python3.8/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 600, in remove_bgp_speaker_by_id
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server if bgp_speaker_id in self.cache:
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server TypeError: unhashable type: 'dict'
2021-09-15 13:36:26.605 1308396 ERROR oslo_messaging.rpc.server
This issue appears to match a comment here: https://review.opendev.org/c/openstack/neutron-dynamic-routing/+/780675/3#message-c409a4fb83a44216e03a041921c7067f44eb70d0
We will test out https://review.opendev.org/c/openstack/neutron-
dynamic-routing/+/780675 as in our case the automatic behaviour
appears mostly unnecessary, but a fix for the underlying issue would
still be appreciated.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1943725/+subscriptions