yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85939
[Bug 1926531] [NEW] SNAT namespace prematurely created then deleted on hosts, resulting in removal of RFP/FPR link to FIP namespace
Public bug reported:
Seems like collateral from
https://bugs.launchpad.net/neutron/+bug/1850779
I think this fix causes problems. We have multiple nodes that are
DVR_SNAT mode. Snat namespace is scheduled to 1 of them.
When l3-agent is restarted on the othre nodes, now, initialize() is
invoked always for DvrEdgeRouter which creates the SNAT namespace
prematurely. This in turn causes external_gateway_added() to later
detect that this host is NOT hosting snat router, but the namespace
exists, so it removes it by triggering
external_gateway_removed(dvr_edge_router --> dvr_local_router)
Problem is that the dvr_local_router code for external_gateway_removed()
ends up DELETING the rfp/fpr pair and severs the qrouter connection to
fip namespace (and deletes all the FIP routes in fip namespace as a
result).
Prior to this bug fix, _create_snat_namespace was only invoked in
_create_dvr_gateway(), which was only invoked when the node was actually
hosting SNAT for the router.
Even without the breaking issue of deleting the rtr_2_fip link, this fix
unnecessarily creates SNAT namespace on every host, only for it to be
deleted.
FYI this is for non-HA routers
1. Where the qrouter to FIP link is deleted:
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L599
This results in connectivity breakage
2. Above #1 is triggered by code here in edge router which sees snat
namespace, but SNAT is scheduled to different host:
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_edge_router.py#L56
3. SNAT namespace is created on wrong host because of bug fix for
1850779 which moved it to DvrEdgeRouter intilization
** Affects: neutron
Importance: Undecided
Status: New
** Tags: l3-dvr-backlog l3-ha
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1926531
Title:
SNAT namespace prematurely created then deleted on hosts, resulting in
removal of RFP/FPR link to FIP namespace
Status in neutron:
New
Bug description:
Seems like collateral from
https://bugs.launchpad.net/neutron/+bug/1850779
I think this fix causes problems. We have multiple nodes that are
DVR_SNAT mode. Snat namespace is scheduled to 1 of them.
When l3-agent is restarted on the othre nodes, now, initialize() is
invoked always for DvrEdgeRouter which creates the SNAT namespace
prematurely. This in turn causes external_gateway_added() to later
detect that this host is NOT hosting snat router, but the namespace
exists, so it removes it by triggering
external_gateway_removed(dvr_edge_router --> dvr_local_router)
Problem is that the dvr_local_router code for
external_gateway_removed() ends up DELETING the rfp/fpr pair and
severs the qrouter connection to fip namespace (and deletes all the
FIP routes in fip namespace as a result).
Prior to this bug fix, _create_snat_namespace was only invoked in
_create_dvr_gateway(), which was only invoked when the node was
actually hosting SNAT for the router.
Even without the breaking issue of deleting the rtr_2_fip link, this
fix unnecessarily creates SNAT namespace on every host, only for it to
be deleted.
FYI this is for non-HA routers
1. Where the qrouter to FIP link is deleted:
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L599
This results in connectivity breakage
2. Above #1 is triggered by code here in edge router which sees snat
namespace, but SNAT is scheduled to different host:
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_edge_router.py#L56
3. SNAT namespace is created on wrong host because of bug fix for
1850779 which moved it to DvrEdgeRouter intilization
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1926531/+subscriptions