yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #43024
[Bug 1523999] [NEW] Any error in L3 agent after external gateway is configured but before the local cache is updated results in errors in subsequent router updates
Public bug reported:
Reproduction:
* Create a new router
* Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
Any follow up update to the router (Add/remove interface, add/remove
FIP) will fail non-idempotent operations on the external device. This is
because any update will try to add the gateway again (Because
self.ex_gw_port = None). Even without a specific failure, reconfiguring
the external device is wasteful.
HA routers in particular will fail by throwing
VIPDuplicateAddressException for the external device's VIP. This
behavior was actually changed in a recent Mitaka patch
(https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
so this affects Juno to Liberty but not master and future releases.
The impact on legacy or distributed routers is less severe as their
process_external and routes_updated seem to be idempotent - Verified
against master via a makeshift functional test, I could not vouch for
previous releases.
Severity: It's severe for HA routers from Juno to Liberty, but not as
much for other routes types or HA routers on master.
** Affects: neutron
Importance: Medium
Status: New
** Tags: l3-dvr-backlog l3-ha l3-ipam-dhcp
** Description changed:
Reproduction:
- Create a new router
- Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
+ * Create a new router
+ * Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
Any follow up update to the router (Add/remove interface, add/remove
FIP) will fail non-idempotent operations on the external device. This is
because any update will try to add the gateway again (Because
self.ex_gw_port = None). Even without a specific failure, reconfiguring
the external device is wasteful.
HA routers in particular will fail by throwing
VIPDuplicateAddressException for the external device's VIP. This
behavior was actually changed in a recent Mitaka patch
(https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
so this affects Juno to Liberty.
The impact on legacy or distributed routers is less severe as their
process_external and routes_updated seem to be idempotent - Verified
against master via a makeshift functional test, I could not vouch for
previous releases.
** Description changed:
Reproduction:
* Create a new router
* Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
Any follow up update to the router (Add/remove interface, add/remove
FIP) will fail non-idempotent operations on the external device. This is
because any update will try to add the gateway again (Because
self.ex_gw_port = None). Even without a specific failure, reconfiguring
the external device is wasteful.
HA routers in particular will fail by throwing
VIPDuplicateAddressException for the external device's VIP. This
behavior was actually changed in a recent Mitaka patch
(https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
- so this affects Juno to Liberty.
+ so this affects Juno to Liberty but not master and future releases.
The impact on legacy or distributed routers is less severe as their
process_external and routes_updated seem to be idempotent - Verified
against master via a makeshift functional test, I could not vouch for
previous releases.
** Description changed:
Reproduction:
* Create a new router
* Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
Any follow up update to the router (Add/remove interface, add/remove
FIP) will fail non-idempotent operations on the external device. This is
because any update will try to add the gateway again (Because
self.ex_gw_port = None). Even without a specific failure, reconfiguring
the external device is wasteful.
HA routers in particular will fail by throwing
VIPDuplicateAddressException for the external device's VIP. This
behavior was actually changed in a recent Mitaka patch
(https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
so this affects Juno to Liberty but not master and future releases.
The impact on legacy or distributed routers is less severe as their
process_external and routes_updated seem to be idempotent - Verified
against master via a makeshift functional test, I could not vouch for
previous releases.
+
+ Severity: It's severe for HA routers from Juno to Liberty, but not as
+ much for other routes types or HA routers on master.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1523999
Title:
Any error in L3 agent after external gateway is configured but before
the local cache is updated results in errors in subsequent router
updates
Status in neutron:
New
Bug description:
Reproduction:
* Create a new router
* Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
Any follow up update to the router (Add/remove interface, add/remove
FIP) will fail non-idempotent operations on the external device. This
is because any update will try to add the gateway again (Because
self.ex_gw_port = None). Even without a specific failure,
reconfiguring the external device is wasteful.
HA routers in particular will fail by throwing
VIPDuplicateAddressException for the external device's VIP. This
behavior was actually changed in a recent Mitaka patch
(https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
so this affects Juno to Liberty but not master and future releases.
The impact on legacy or distributed routers is less severe as their
process_external and routes_updated seem to be idempotent - Verified
against master via a makeshift functional test, I could not vouch for
previous releases.
Severity: It's severe for HA routers from Juno to Liberty, but not as
much for other routes types or HA routers on master.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1523999/+subscriptions
Follow ups