yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #45700
[Bug 1393391] Re: neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-update_fanout..
** Also affects: neutron (Ubuntu)
Importance: Undecided
Status: New
** Also affects: neutron (Ubuntu Trusty)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1393391
Title:
neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-
update_fanout..
Status in neutron:
Confirmed
Status in neutron package in Ubuntu:
New
Status in neutron source package in Trusty:
New
Bug description:
Under an HA deployment, neutron-openvswitch-agent can get stuck
when receiving a close command on a fanout queue the agent is not subscribed to.
It stops responding to any other messages, so it stops effectively
working at all.
2014-11-11 10:27:33.092 3027 INFO neutron.common.config [-] Logging enabled!
2014-11-11 10:27:34.285 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
2014-11-11 10:27:34.370 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
2014-11-11 10:27:35.348 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent initialized successfully, now running...
2014-11-11 10:27:35.351 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent out of sync with plugin!
2014-11-11 10:27:35.401 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent tunnel out of sync with plugin!
2014-11-11 10:27:35.414 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
2014-11-11 10:32:33.143 3027 INFO neutron.agent.securitygroups_rpc [req-22c7fa11-882d-4278-9f83-6dd56ab95ba4 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-11 10:58:11.916 3027 INFO neutron.agent.securitygroups_rpc [req-484fd71f-8f61-496c-aa8a-2d3abf8de365 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-11 10:59:43.954 3027 INFO neutron.agent.securitygroups_rpc [req-2c0bc777-04ed-470a-aec5-927a59100b89 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-11 11:00:22.500 3027 INFO neutron.agent.securitygroups_rpc [req-df447d01-d132-40f2-8528-1c1c4d57c0f5 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
2014-11-12 01:27:35.662 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout,
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout)
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method()
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common raise m
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed
2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common
2014-11-12 01:27:35.695 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672
2014-11-12 01:27:35.722 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672
2014-11-12 02:00:22.682 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return method(*args, **kwargs)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return connection.drain_events(**kwargs)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common chanmap, None, timeout=timeout,
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common channel, method_sig, args, content = read_timeout(timeout)
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common return self.method_reader.read_method()
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common raise m
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed
2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common
2014-11-12 02:00:22.683 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672
2014-11-12 02:00:23.017 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672
2014-11-12 02:00:23.021 3027 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-11-12 02:00:23.021 3027 TRACE root Traceback (most recent call last):
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-11-12 02:00:23.021 3027 TRACE root return infunc(*args, **kwargs)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread
2014-11-12 02:00:23.021 3027 TRACE root self.consume()
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 737, in consume
2014-11-12 02:00:23.021 3027 TRACE root six.next(it)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 664, in iterconsume
2014-11-12 02:00:23.021 3027 TRACE root yield self.ensure(_error_callback, _consume)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
2014-11-12 02:00:23.021 3027 TRACE root return method(*args, **kwargs)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 657, in _consume
2014-11-12 02:00:23.021 3027 TRACE root queues_tail.consume(nowait=False)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 190, in consume
2014-11-12 02:00:23.021 3027 TRACE root self.queue.consume(*args, callback=_callback, **options)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 598, in consume
2014-11-12 02:00:23.021 3027 TRACE root nowait=nowait)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1769, in basic_consume
2014-11-12 02:00:23.021 3027 TRACE root (60, 21), # Channel.basic_consume_ok
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 71, in wait
2014-11-12 02:00:23.021 3027 TRACE root return self.dispatch_method(method_sig, args, content)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 88, in dispatch_method
2014-11-12 02:00:23.021 3027 TRACE root return amqp_method(self, args)
2014-11-12 02:00:23.021 3027 TRACE root File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 224, in _close
2014-11-12 02:00:23.021 3027 TRACE root raise ChannelError(reply_code, reply_text, (class_id, method_id))
2014-11-12 02:00:23.021 3027 TRACE root ChannelError: 404: (NOT_FOUND - no queue 'q-agent-notifier-port-update_fanout_cc21f47607704321860757b7e6a1194a' in vhost '/', (60, 20), None)
2014-11-12 02:00:23.021 3027 TRACE root
2014-11-12 02:01:24.268 3027 ERROR root [-] Unexpected exception occurred 61 time(s)... retrying.
2014-11-12 02:01:24.268 3027 TRACE root Traceback (most recent call last):
2014-11-12 02:01:24.268 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-11-12 02:01:24.268 3027 TRACE root return infunc(*args, **kwargs)
2014-11-12 02:01:24.268 3027 TRACE root File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread
---------------------------
[Impact]
This patch addresses an issue under a RabbitMQ HA deployment where
neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-
update_fanout_xx' error when one of the RabbitMQ cluster node goes
down, if there are more than 100 nova compute nodes, all neutron
agents are down which is awful, even restart neutron-openvswitch agent
can solve it, it is not the idea reality to restart all of the agents
on all compute nodes, it broke HA.
[Test Case]
Note steps are for trusty-icehouse, including neutron package
1:2014.1.5-0ubuntu1.
Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
kill one of the rabbit nodes (e.g. sudo service rabbitmq-server stop,
etc). Observe that the neutron agents stopped to consume messages and
keep throw no queue 'q-agent-notifier-port-update_fanout..' exception.
[Regression Potential]
None.
[Other Info]
Oslo library has this fix, but due to Neutron is using kombu other
than oslo library in Icehouse, it still suffer this issue.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1393391/+subscriptions
References