yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #14108
[Bug 1315613] [NEW] Queue l3_agent has been deleted with multiple l3 agent
Public bug reported:
1) how i found this
hello. i'm using l3 multiple l3 agents on two network node.
for there is no out-of-box high availability for l3 agent,
i'm manually scheduling external network with RESTful apis of neutron, in case of one of l3 agent dead
but there is something strange.
with a l3-agent started, and i start second l3 agent,
an error ocurrs on l3 agent like below
SessionError: Queue l3_agent has been deleted.
(qpid/broker/Queue.cpp:1855)(408)
2) real cause
after i've searched more, the real problem, i found, turns out that
if i stop l3 agent with "service neutron-l3-agent stop",
then the queue "l3_agent" on controller node is deleted.
because of this, if i schedule manually a logical router with more than two l3 agent,
when one of l3 agent get to be stopped, all of the l3 agents stop working.
3) how to regenerate
to regenerate what i've said, follow the steps above.
1. make more than two l3 agent running
- i'm not using host option on neutorn.conf
- with two node, each has one l3 agent running
2. check out l3_agent queue is on qpid server
# yum install -y qpid-tools
# qpid-config queues
...
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
l3_agent <------------------- this one
l3_agent.network
l3_agent.network2
q-l3-plugin
...
stop one of the l3 agent.
3. service neutron-l3-agent stop
4. repeat step2 : check out l3_agent queue is on qpid server
# qpid-config queues
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
<------------------- the queue "l3_agent" is deleted
l3_agent.network
q-l3-plugin
q-plugin
5. check out log of l3 agent which is running
# tail -f /var/log/neutron/l3-agent.log
2014-05-03 15:52:58.357 2389 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-05-03 15:52:58.357 2389 TRACE root Traceback (most recent call last):
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-05-03 15:52:58.357 2389 TRACE root return infunc(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 699, in _consumer_thread
2014-05-03 15:52:58.357 2389 TRACE root self.consume()
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 690, in consume
2014-05-03 15:52:58.357 2389 TRACE root six.next(it)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 607, in iterconsume
2014-05-03 15:52:58.357 2389 TRACE root yield self.ensure(_error_callback, _consume)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 541, in ensure
2014-05-03 15:52:58.357 2389 TRACE root return method(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 598, in _consume
2014-05-03 15:52:58.357 2389 TRACE root nxt_receiver = self.session.next_receiver(timeout=timeout)
2014-05-03 15:52:58.357 2389 TRACE root File "<string>", line 6, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root if self._ecwait(lambda: self.incoming, timeout):
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
2014-05-03 15:52:58.357 2389 TRACE root result = self._ewait(lambda: self.closed or predicate(), timeout)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 572, in _ewait
2014-05-03 15:52:58.357 2389 TRACE root self.check_error()
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 561, in check_error
2014-05-03 15:52:58.357 2389 TRACE root raise self.error
2014-05-03 15:52:58.357 2389 TRACE root SessionError: Queue l3_agent has been deleted. (qpid/broker/Queue.cpp:1855)(408)
2014-05-03 15:52:58.357 2389 TRACE root
** Affects: neutron
Importance: Undecided
Status: New
** Tags: l3agent queue
** Description changed:
1) how i found this
- hello. i'm using l3 multiple l3 agents on two network node.
- for there is no out-of-box high availability for l3 agent,
+ hello. i'm using l3 multiple l3 agents on two network node.
+ for there is no out-of-box high availability for l3 agent,
i'm manually scheduling external network with RESTful apis of neutron, in case of one of l3 agent dead
- but there is something strange.
- with a l3-agent started, and i start second l3 agent,
+ but there is something strange.
+ with a l3-agent started, and i start second l3 agent,
an error ocurrs on l3 agent like below
SessionError: Queue l3_agent has been deleted.
(qpid/broker/Queue.cpp:1855)(408)
-
-
2) real cause
after i've searched more, the real problem, i found, turns out that
- if i stop l3 agent with "service neutron-l3-agent stop",
- then the queue "l3_agent" on controller node is deleted.
+ if i stop l3 agent with "service neutron-l3-agent stop",
+ then the queue "l3_agent" on controller node is deleted.
because of this, if i schedule manually a logical router with more than two l3 agent,
- and if one of l3 agent get to be stopped, all of the l3 agents stop working.
-
-
+ when one of l3 agent get to be stopped, all of the l3 agents stop working.
3) how to regenerate
- to regenerate what i've said, follow the steps above.
+ to regenerate what i've said, follow the steps above.
- 1. make more than two l3 agent running
+ 1. make more than two l3 agent running
- i'm not using host option on neutorn.conf
- with two node, each has one l3 agent running
2. check out l3_agent queue is on qpid server
# yum install -y qpid-tools
# qpid-config queues
...
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
l3_agent <------------------- this one
l3_agent.network
l3_agent.network2
q-l3-plugin
...
- stop one of the l3 agent.
+ stop one of the l3 agent.
3. service neutron-l3-agent stop
4. repeat step2 : check out l3_agent queue is on qpid server
# qpid-config queues
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
- <------------------- the queue "l3_agent" is deleted
+ <------------------- the queue "l3_agent" is deleted
l3_agent.network
q-l3-plugin
q-plugin
5. check out log of l3 agent which is running
# tail -f /var/log/neutron/l3-agent.log
2014-05-03 15:52:58.357 2389 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-05-03 15:52:58.357 2389 TRACE root Traceback (most recent call last):
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-05-03 15:52:58.357 2389 TRACE root return infunc(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 699, in _consumer_thread
2014-05-03 15:52:58.357 2389 TRACE root self.consume()
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 690, in consume
2014-05-03 15:52:58.357 2389 TRACE root six.next(it)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 607, in iterconsume
2014-05-03 15:52:58.357 2389 TRACE root yield self.ensure(_error_callback, _consume)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 541, in ensure
2014-05-03 15:52:58.357 2389 TRACE root return method(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 598, in _consume
2014-05-03 15:52:58.357 2389 TRACE root nxt_receiver = self.session.next_receiver(timeout=timeout)
2014-05-03 15:52:58.357 2389 TRACE root File "<string>", line 6, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root if self._ecwait(lambda: self.incoming, timeout):
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
2014-05-03 15:52:58.357 2389 TRACE root result = self._ewait(lambda: self.closed or predicate(), timeout)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 572, in _ewait
2014-05-03 15:52:58.357 2389 TRACE root self.check_error()
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 561, in check_error
2014-05-03 15:52:58.357 2389 TRACE root raise self.error
2014-05-03 15:52:58.357 2389 TRACE root SessionError: Queue l3_agent has been deleted. (qpid/broker/Queue.cpp:1855)(408)
2014-05-03 15:52:58.357 2389 TRACE root
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1315613
Title:
Queue l3_agent has been deleted with multiple l3 agent
Status in OpenStack Neutron (virtual network service):
New
Bug description:
1) how i found this
hello. i'm using l3 multiple l3 agents on two network node.
for there is no out-of-box high availability for l3 agent,
i'm manually scheduling external network with RESTful apis of neutron, in case of one of l3 agent dead
but there is something strange.
with a l3-agent started, and i start second l3 agent,
an error ocurrs on l3 agent like below
SessionError: Queue l3_agent has been deleted.
(qpid/broker/Queue.cpp:1855)(408)
2) real cause
after i've searched more, the real problem, i found, turns out that
if i stop l3 agent with "service neutron-l3-agent stop",
then the queue "l3_agent" on controller node is deleted.
because of this, if i schedule manually a logical router with more than two l3 agent,
when one of l3 agent get to be stopped, all of the l3 agents stop working.
3) how to regenerate
to regenerate what i've said, follow the steps above.
1. make more than two l3 agent running
- i'm not using host option on neutorn.conf
- with two node, each has one l3 agent running
2. check out l3_agent queue is on qpid server
# yum install -y qpid-tools
# qpid-config queues
...
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
l3_agent <------------------- this one
l3_agent.network
l3_agent.network2
q-l3-plugin
...
stop one of the l3 agent.
3. service neutron-l3-agent stop
4. repeat step2 : check out l3_agent queue is on qpid server
# qpid-config queues
dhcp_agent
dhcp_agent.network
dhcp_agent.network2
<------------------- the queue "l3_agent" is deleted
l3_agent.network
q-l3-plugin
q-plugin
5. check out log of l3 agent which is running
# tail -f /var/log/neutron/l3-agent.log
2014-05-03 15:52:58.357 2389 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-05-03 15:52:58.357 2389 TRACE root Traceback (most recent call last):
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/excutils.py", line 92, in inner_func
2014-05-03 15:52:58.357 2389 TRACE root return infunc(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 699, in _consumer_thread
2014-05-03 15:52:58.357 2389 TRACE root self.consume()
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 690, in consume
2014-05-03 15:52:58.357 2389 TRACE root six.next(it)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 607, in iterconsume
2014-05-03 15:52:58.357 2389 TRACE root yield self.ensure(_error_callback, _consume)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 541, in ensure
2014-05-03 15:52:58.357 2389 TRACE root return method(*args, **kwargs)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/impl_qpid.py", line 598, in _consume
2014-05-03 15:52:58.357 2389 TRACE root nxt_receiver = self.session.next_receiver(timeout=timeout)
2014-05-03 15:52:58.357 2389 TRACE root File "<string>", line 6, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
2014-05-03 15:52:58.357 2389 TRACE root if self._ecwait(lambda: self.incoming, timeout):
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
2014-05-03 15:52:58.357 2389 TRACE root result = self._ewait(lambda: self.closed or predicate(), timeout)
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 572, in _ewait
2014-05-03 15:52:58.357 2389 TRACE root self.check_error()
2014-05-03 15:52:58.357 2389 TRACE root File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 561, in check_error
2014-05-03 15:52:58.357 2389 TRACE root raise self.error
2014-05-03 15:52:58.357 2389 TRACE root SessionError: Queue l3_agent has been deleted. (qpid/broker/Queue.cpp:1855)(408)
2014-05-03 15:52:58.357 2389 TRACE root
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1315613/+subscriptions
Follow ups
References