← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1393391] Re: neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-update_fanout..

 

** Also affects: neutron (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: neutron (Ubuntu Trusty)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1393391

Title:
  neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-
  update_fanout..

Status in neutron:
  Confirmed
Status in neutron package in Ubuntu:
  New
Status in neutron source package in Trusty:
  New

Bug description:
  Under an HA deployment, neutron-openvswitch-agent can get stuck
  when receiving a close command on a fanout queue the agent is not subscribed to.

  It stops responding to any other messages, so it stops effectively
  working at all.

  2014-11-11 10:27:33.092 3027 INFO neutron.common.config [-] Logging enabled!
  2014-11-11 10:27:34.285 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
  2014-11-11 10:27:34.370 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
  2014-11-11 10:27:35.348 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent initialized successfully, now running...
  2014-11-11 10:27:35.351 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent out of sync with plugin!
  2014-11-11 10:27:35.401 3027 INFO neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Agent tunnel out of sync with plugin!
  2014-11-11 10:27:35.414 3027 INFO neutron.openstack.common.rpc.common [req-66ba318b-0fcc-42c2-959e-9a5233c292ef None] Connected to AMQP server on vip-rabbitmq:5672
  2014-11-11 10:32:33.143 3027 INFO neutron.agent.securitygroups_rpc [req-22c7fa11-882d-4278-9f83-6dd56ab95ba4 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
  2014-11-11 10:58:11.916 3027 INFO neutron.agent.securitygroups_rpc [req-484fd71f-8f61-496c-aa8a-2d3abf8de365 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
  2014-11-11 10:59:43.954 3027 INFO neutron.agent.securitygroups_rpc [req-2c0bc777-04ed-470a-aec5-927a59100b89 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
  2014-11-11 11:00:22.500 3027 INFO neutron.agent.securitygroups_rpc [req-df447d01-d132-40f2-8528-1c1c4d57c0f5 None] Security group member updated [u'4c7b3ad2-4526-48a7-959e-a8b8e4da6413']
  2014-11-12 01:27:35.662 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     return method(*args, **kwargs)
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     return self.connection.drain_events(timeout=timeout)
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     return self.transport.drain_events(self.connection, **kwargs)
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     return connection.drain_events(**kwargs)
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     chanmap, None, timeout=timeout,
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     channel, method_sig, args, content = read_timeout(timeout)
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     return self.method_reader.read_method()
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common     raise m
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed
  2014-11-12 01:27:35.662 3027 TRACE neutron.openstack.common.rpc.common
  2014-11-12 01:27:35.695 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672
  2014-11-12 01:27:35.722 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672
  2014-11-12 02:00:22.682 3027 ERROR neutron.openstack.common.rpc.common [-] Failed to consume message from queue: Socket closed
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common Traceback (most recent call last):
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     return method(*args, **kwargs)
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 659, in _consume
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     return self.connection.drain_events(timeout=timeout)
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 281, in drain_events
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     return self.transport.drain_events(self.connection, **kwargs)
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 94, in drain_events
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     return connection.drain_events(**kwargs)
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 266, in drain_events
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     chanmap, None, timeout=timeout,
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 328, in _wait_multiple
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     channel, method_sig, args, content = read_timeout(timeout)
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/connection.py", line 292, in read_timeout
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     return self.method_reader.read_method()
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common   File "/usr/lib/python2.7/site-packages/amqp/method_framing.py", line 192, in read_method
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common     raise m
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common IOError: Socket closed
  2014-11-12 02:00:22.682 3027 TRACE neutron.openstack.common.rpc.common
  2014-11-12 02:00:22.683 3027 INFO neutron.openstack.common.rpc.common [-] Reconnecting to AMQP server on vip-rabbitmq:5672
  2014-11-12 02:00:23.017 3027 INFO neutron.openstack.common.rpc.common [-] Connected to AMQP server on vip-rabbitmq:5672
  2014-11-12 02:00:23.021 3027 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
  2014-11-12 02:00:23.021 3027 TRACE root Traceback (most recent call last):
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
  2014-11-12 02:00:23.021 3027 TRACE root     return infunc(*args, **kwargs)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread
  2014-11-12 02:00:23.021 3027 TRACE root     self.consume()
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 737, in consume
  2014-11-12 02:00:23.021 3027 TRACE root     six.next(it)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 664, in iterconsume
  2014-11-12 02:00:23.021 3027 TRACE root     yield self.ensure(_error_callback, _consume)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 579, in ensure
  2014-11-12 02:00:23.021 3027 TRACE root     return method(*args, **kwargs)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 657, in _consume
  2014-11-12 02:00:23.021 3027 TRACE root     queues_tail.consume(nowait=False)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 190, in consume
  2014-11-12 02:00:23.021 3027 TRACE root     self.queue.consume(*args, callback=_callback, **options)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/kombu/entity.py", line 598, in consume
  2014-11-12 02:00:23.021 3027 TRACE root     nowait=nowait)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 1769, in basic_consume
  2014-11-12 02:00:23.021 3027 TRACE root     (60, 21),  # Channel.basic_consume_ok
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 71, in wait
  2014-11-12 02:00:23.021 3027 TRACE root     return self.dispatch_method(method_sig, args, content)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/abstract_channel.py", line 88, in dispatch_method
  2014-11-12 02:00:23.021 3027 TRACE root     return amqp_method(self, args)
  2014-11-12 02:00:23.021 3027 TRACE root   File "/usr/lib/python2.7/site-packages/amqp/channel.py", line 224, in _close
  2014-11-12 02:00:23.021 3027 TRACE root     raise ChannelError(reply_code, reply_text, (class_id, method_id))
  2014-11-12 02:00:23.021 3027 TRACE root ChannelError: 404: (NOT_FOUND - no queue 'q-agent-notifier-port-update_fanout_cc21f47607704321860757b7e6a1194a' in vhost '/', (60, 20), None)
  2014-11-12 02:00:23.021 3027 TRACE root
  2014-11-12 02:01:24.268 3027 ERROR root [-] Unexpected exception occurred 61 time(s)... retrying.
  2014-11-12 02:01:24.268 3027 TRACE root Traceback (most recent call last):
  2014-11-12 02:01:24.268 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
  2014-11-12 02:01:24.268 3027 TRACE root     return infunc(*args, **kwargs)
  2014-11-12 02:01:24.268 3027 TRACE root   File "/usr/lib/python2.7/site-packages/neutron/openstack/common/rpc/impl_kombu.py", line 746, in _consumer_thread

  
  ---------------------------

  [Impact]

  This patch addresses an issue under a RabbitMQ HA deployment where
  neutron-openvswitch-agent stuck on no queue 'q-agent-notifier-port-
  update_fanout_xx' error when one of the RabbitMQ cluster node goes
  down, if there are more than 100 nova compute nodes, all neutron
  agents are down which is awful, even restart neutron-openvswitch agent
  can solve it, it is not the idea reality to restart all of the agents
  on all compute nodes, it broke HA.

  [Test Case]

  Note steps are for trusty-icehouse, including neutron package
  1:2014.1.5-0ubuntu1.

  Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly
  kill one of the rabbit nodes (e.g.  sudo service rabbitmq-server stop,
  etc). Observe that the neutron agents stopped to consume messages and
  keep throw no queue 'q-agent-notifier-port-update_fanout..' exception.

  [Regression Potential]

  None.

  [Other Info]

  Oslo library has this fix, but due to Neutron is using kombu other
  than oslo library in Icehouse, it still suffer this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1393391/+subscriptions


References