yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85205
[Bug 1871850] Re: [L3] existing router resources are partial deleted unexpectedly when MQ is gone
** Also affects: neutron (Ubuntu)
Importance: Undecided
Status: New
** Also affects: neutron (Ubuntu Bionic)
Importance: Undecided
Status: New
** Changed in: neutron (Ubuntu)
Status: New => Fix Released
** Changed in: neutron (Ubuntu Bionic)
Assignee: (unassigned) => Trent Lloyd (lathiat)
** Changed in: neutron (Ubuntu Bionic)
Importance: Undecided => Critical
** Changed in: neutron (Ubuntu Bionic)
Status: New => In Progress
** Description changed:
+ (For SRU template, please see bug 1869808, as the SRU info there applies
+ to this bug also)
+
ENV: meet this issue on our stable/queens deployment, but master branch
has the same code logic
When the L3 agent get a router update notification, it will try to
retrieve the router info from DB server [1]. But at this time, if the
message queue is down/unreachable. It will get exceptions related
message queue. A resync action will be run then [2]. Sometimes, from my
personal experience, rabbitMQ cluster is not so much easy to recover.
Long time MQ recover time will cause the router info sync RPC never get
successful until it meets the max retry time [3]. So the bad thing
happens, L3 agent is trying to remove the router now [4]. It basically
shutdown all the existing L3 traffic of this router.
[1] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L705
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L710
[3] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L666
[4] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L671
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1871850
Title:
[L3] existing router resources are partial deleted unexpectedly when
MQ is gone
Status in Ubuntu Cloud Archive:
Invalid
Status in Ubuntu Cloud Archive queens series:
New
Status in Ubuntu Cloud Archive rocky series:
New
Status in Ubuntu Cloud Archive stein series:
Fix Released
Status in Ubuntu Cloud Archive train series:
Fix Released
Status in Ubuntu Cloud Archive ussuri series:
Fix Released
Status in neutron:
Fix Released
Status in neutron package in Ubuntu:
Fix Released
Status in neutron source package in Bionic:
In Progress
Bug description:
(For SRU template, please see bug 1869808, as the SRU info there
applies to this bug also)
ENV: meet this issue on our stable/queens deployment, but master
branch has the same code logic
When the L3 agent get a router update notification, it will try to
retrieve the router info from DB server [1]. But at this time, if the
message queue is down/unreachable. It will get exceptions related
message queue. A resync action will be run then [2]. Sometimes, from
my personal experience, rabbitMQ cluster is not so much easy to
recover. Long time MQ recover time will cause the router info sync RPC
never get successful until it meets the max retry time [3]. So the bad
thing happens, L3 agent is trying to remove the router now [4]. It
basically shutdown all the existing L3 traffic of this router.
[1] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L705
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L710
[3] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L666
[4] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L671
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1871850/+subscriptions
References