← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1315613] [NEW] Queue l3_agent has been deleted with multiple l3 agent

 

Public bug reported:

1) how i found this

hello. i'm using l3 multiple l3 agents on two network node.
for there is no out-of-box high availability for l3 agent,
i'm manually scheduling external network with RESTful apis of neutron, in case of one of l3 agent dead

but there is something strange.
with a l3-agent started, and i start second l3 agent,
an error ocurrs on l3 agent like below

SessionError: Queue l3_agent has been deleted.
(qpid/broker/Queue.cpp:1855)(408)

2) real cause

after i've searched more, the real problem, i found, turns out that
if i stop l3 agent with "service neutron-l3-agent stop",
then the queue "l3_agent" on controller node is deleted.

because of this, if i schedule manually  a logical router with more than two l3 agent,
when one of l3 agent get to be stopped, all of the l3 agents stop working.

3) how to regenerate
to regenerate what i've said, follow the steps above.

1. make more than two l3 agent running
- i'm not using host option on neutorn.conf
- with two node, each has one l3 agent running

2. check out l3_agent queue is on qpid server
# yum install -y qpid-tools
# qpid-config  queues
...
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
l3_agent     <------------------- this one
l3_agent.network
l3_agent.network2
q-l3-plugin
...
stop one of the l3 agent.

3. service neutron-l3-agent stop

4. repeat step2 : check out l3_agent queue is on qpid server
# qpid-config  queues
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
 <------------------- the queue "l3_agent" is deleted
l3_agent.network
q-l3-plugin
q-plugin

5. check out log of l3 agent which is running
# tail -f /var/log/neutron/l3-agent.log
2014-05-03 15:52:58.357 2389 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-05-03 15:52:58.357 2389 TRACE root Traceback (most recent call last):
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-05-03 15:52:58.357 2389 TRACE root     return infunc(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 699, in _consumer_thread
2014-05-03 15:52:58.357 2389 TRACE root     self.consume()
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 690, in consume
2014-05-03 15:52:58.357 2389 TRACE root     six.next(it)
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 607, in iterconsume
2014-05-03 15:52:58.357 2389 TRACE root     yield self.ensure(_error_callback, _consume)
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 541, in ensure
2014-05-03 15:52:58.357 2389 TRACE root     return method(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 598, in _consume
2014-05-03 15:52:58.357 2389 TRACE root     nxt_receiver = self.session.next_receiver(timeout=timeout)
2014-05-03 15:52:58.357 2389 TRACE root   File "<string>", line 6, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root     if self._ecwait(lambda: self.incoming, timeout):
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
2014-05-03 15:52:58.357 2389 TRACE root     result = self._ewait(lambda: self.closed or predicate(), timeout)
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 572, in _ewait
2014-05-03 15:52:58.357 2389 TRACE root     self.check_error()
2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 561, in check_error
2014-05-03 15:52:58.357 2389 TRACE root     raise self.error
2014-05-03 15:52:58.357 2389 TRACE root SessionError: Queue l3_agent has been deleted. (qpid/broker/Queue.cpp:1855)(408)
2014-05-03 15:52:58.357 2389 TRACE root

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: l3agent queue

** Description changed:

  1) how i found this
  
- hello. i'm using l3 multiple l3 agents on two network node. 
- for there is no out-of-box high availability for l3 agent, 
+ hello. i'm using l3 multiple l3 agents on two network node.
+ for there is no out-of-box high availability for l3 agent,
  i'm manually scheduling external network with RESTful apis of neutron, in case of one of l3 agent dead
  
- but there is something strange. 
- with a l3-agent started, and i start second l3 agent, 
+ but there is something strange.
+ with a l3-agent started, and i start second l3 agent,
  an error ocurrs on l3 agent like below
  
  SessionError: Queue l3_agent has been deleted.
  (qpid/broker/Queue.cpp:1855)(408)
  
- 
- 
  2) real cause
  
  after i've searched more, the real problem, i found, turns out that
- if i stop l3 agent with "service neutron-l3-agent stop", 
- then the queue "l3_agent" on controller node is deleted. 
+ if i stop l3 agent with "service neutron-l3-agent stop",
+ then the queue "l3_agent" on controller node is deleted.
  
  because of this, if i schedule manually  a logical router with more than two l3 agent,
- and if one of l3 agent get to be stopped, all of the l3 agents stop working. 
- 
- 
+ when one of l3 agent get to be stopped, all of the l3 agents stop working.
  
  3) how to regenerate
- to regenerate what i've said, follow the steps above. 
+ to regenerate what i've said, follow the steps above.
  
- 1. make more than two l3 agent running 
+ 1. make more than two l3 agent running
  - i'm not using host option on neutorn.conf
  - with two node, each has one l3 agent running
  
  2. check out l3_agent queue is on qpid server
  # yum install -y qpid-tools
  # qpid-config  queues
  ...
  dhcp_agent
  dhcp_agent.network
  dhcp_agent.network2
  l3_agent     <------------------- this one
  l3_agent.network
  l3_agent.network2
  q-l3-plugin
  ...
- stop one of the l3 agent. 
+ stop one of the l3 agent.
  
  3. service neutron-l3-agent stop
  
  4. repeat step2 : check out l3_agent queue is on qpid server
  # qpid-config  queues
  dhcp_agent
  dhcp_agent.network
  dhcp_agent.network2
-  <------------------- the queue "l3_agent" is deleted 
+  <------------------- the queue "l3_agent" is deleted
  l3_agent.network
  q-l3-plugin
  q-plugin
  
  5. check out log of l3 agent which is running
  # tail -f /var/log/neutron/l3-agent.log
  2014-05-03 15:52:58.357 2389 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
  2014-05-03 15:52:58.357 2389 TRACE root Traceback (most recent call last):
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
  2014-05-03 15:52:58.357 2389 TRACE root     return infunc(*args, **kwargs)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 699, in _consumer_thread
  2014-05-03 15:52:58.357 2389 TRACE root     self.consume()
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 690, in consume
  2014-05-03 15:52:58.357 2389 TRACE root     six.next(it)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 607, in iterconsume
  2014-05-03 15:52:58.357 2389 TRACE root     yield self.ensure(_error_callback, _consume)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 541, in ensure
  2014-05-03 15:52:58.357 2389 TRACE root     return method(*args, **kwargs)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 598, in _consume
  2014-05-03 15:52:58.357 2389 TRACE root     nxt_receiver = self.session.next_receiver(timeout=timeout)
  2014-05-03 15:52:58.357 2389 TRACE root   File "<string>", line 6, in next_receiver
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
  2014-05-03 15:52:58.357 2389 TRACE root     if self._ecwait(lambda: self.incoming, timeout):
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
  2014-05-03 15:52:58.357 2389 TRACE root     result = self._ewait(lambda: self.closed or predicate(), timeout)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 572, in _ewait
  2014-05-03 15:52:58.357 2389 TRACE root     self.check_error()
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 561, in check_error
  2014-05-03 15:52:58.357 2389 TRACE root     raise self.error
  2014-05-03 15:52:58.357 2389 TRACE root SessionError: Queue l3_agent has been deleted. (qpid/broker/Queue.cpp:1855)(408)
  2014-05-03 15:52:58.357 2389 TRACE root

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1315613

Title:
  Queue l3_agent has been deleted with multiple l3 agent

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  1) how i found this

  hello. i'm using l3 multiple l3 agents on two network node.
  for there is no out-of-box high availability for l3 agent,
  i'm manually scheduling external network with RESTful apis of neutron, in case of one of l3 agent dead

  but there is something strange.
  with a l3-agent started, and i start second l3 agent,
  an error ocurrs on l3 agent like below

  SessionError: Queue l3_agent has been deleted.
  (qpid/broker/Queue.cpp:1855)(408)

  2) real cause

  after i've searched more, the real problem, i found, turns out that
  if i stop l3 agent with "service neutron-l3-agent stop",
  then the queue "l3_agent" on controller node is deleted.

  because of this, if i schedule manually  a logical router with more than two l3 agent,
  when one of l3 agent get to be stopped, all of the l3 agents stop working.

  3) how to regenerate
  to regenerate what i've said, follow the steps above.

  1. make more than two l3 agent running
  - i'm not using host option on neutorn.conf
  - with two node, each has one l3 agent running

  2. check out l3_agent queue is on qpid server
  # yum install -y qpid-tools
  # qpid-config  queues
  ...
  dhcp_agent
  dhcp_agent.network
  dhcp_agent.network2
  l3_agent     <------------------- this one
  l3_agent.network
  l3_agent.network2
  q-l3-plugin
  ...
  stop one of the l3 agent.

  3. service neutron-l3-agent stop

  4. repeat step2 : check out l3_agent queue is on qpid server
  # qpid-config  queues
  dhcp_agent
  dhcp_agent.network
  dhcp_agent.network2
   <------------------- the queue "l3_agent" is deleted
  l3_agent.network
  q-l3-plugin
  q-plugin

  5. check out log of l3 agent which is running
  # tail -f /var/log/neutron/l3-agent.log
  2014-05-03 15:52:58.357 2389 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
  2014-05-03 15:52:58.357 2389 TRACE root Traceback (most recent call last):
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
  2014-05-03 15:52:58.357 2389 TRACE root     return infunc(*args, **kwargs)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 699, in _consumer_thread
  2014-05-03 15:52:58.357 2389 TRACE root     self.consume()
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 690, in consume
  2014-05-03 15:52:58.357 2389 TRACE root     six.next(it)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 607, in iterconsume
  2014-05-03 15:52:58.357 2389 TRACE root     yield self.ensure(_error_callback, _consume)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 541, in ensure
  2014-05-03 15:52:58.357 2389 TRACE root     return method(*args, **kwargs)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 598, in _consume
  2014-05-03 15:52:58.357 2389 TRACE root     nxt_receiver = self.session.next_receiver(timeout=timeout)
  2014-05-03 15:52:58.357 2389 TRACE root   File "<string>", line 6, in next_receiver
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
  2014-05-03 15:52:58.357 2389 TRACE root     if self._ecwait(lambda: self.incoming, timeout):
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
  2014-05-03 15:52:58.357 2389 TRACE root     result = self._ewait(lambda: self.closed or predicate(), timeout)
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 572, in _ewait
  2014-05-03 15:52:58.357 2389 TRACE root     self.check_error()
  2014-05-03 15:52:58.357 2389 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 561, in check_error
  2014-05-03 15:52:58.357 2389 TRACE root     raise self.error
  2014-05-03 15:52:58.357 2389 TRACE root SessionError: Queue l3_agent has been deleted. (qpid/broker/Queue.cpp:1855)(408)
  2014-05-03 15:52:58.357 2389 TRACE root

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1315613/+subscriptions


Follow ups

References