← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1523999] [NEW] Any error in L3 agent after external gateway is configured but before the local cache is updated results in errors in subsequent router updates

 

Public bug reported:

Reproduction:
* Create a new router
* Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).

Any follow up update to the router (Add/remove interface, add/remove
FIP) will fail non-idempotent operations on the external device. This is
because any update will try to add the gateway again (Because
self.ex_gw_port = None). Even without a specific failure, reconfiguring
the external device is wasteful.

HA routers in particular will fail by throwing
VIPDuplicateAddressException for the external device's VIP. This
behavior was actually changed in a recent Mitaka patch
(https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
so this affects Juno to Liberty but not master and future releases.

The impact on legacy or distributed routers is less severe as their
process_external and routes_updated seem to be idempotent - Verified
against master via a makeshift functional test, I could not vouch for
previous releases.

Severity: It's severe for HA routers from Juno to Liberty, but not as
much for other routes types or HA routers on master.

** Affects: neutron
     Importance: Medium
         Status: New


** Tags: l3-dvr-backlog l3-ha l3-ipam-dhcp

** Description changed:

  Reproduction:
- Create a new router
- Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
+ * Create a new router
+ * Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
  
  Any follow up update to the router (Add/remove interface, add/remove
  FIP) will fail non-idempotent operations on the external device. This is
  because any update will try to add the gateway again (Because
  self.ex_gw_port = None). Even without a specific failure, reconfiguring
  the external device is wasteful.
  
  HA routers in particular will fail by throwing
  VIPDuplicateAddressException for the external device's VIP. This
  behavior was actually changed in a recent Mitaka patch
  (https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
  so this affects Juno to Liberty.
  
  The impact on legacy or distributed routers is less severe as their
  process_external and routes_updated seem to be idempotent - Verified
  against master via a makeshift functional test, I could not vouch for
  previous releases.

** Description changed:

  Reproduction:
  * Create a new router
  * Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
  
  Any follow up update to the router (Add/remove interface, add/remove
  FIP) will fail non-idempotent operations on the external device. This is
  because any update will try to add the gateway again (Because
  self.ex_gw_port = None). Even without a specific failure, reconfiguring
  the external device is wasteful.
  
  HA routers in particular will fail by throwing
  VIPDuplicateAddressException for the external device's VIP. This
  behavior was actually changed in a recent Mitaka patch
  (https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
- so this affects Juno to Liberty.
+ so this affects Juno to Liberty but not master and future releases.
  
  The impact on legacy or distributed routers is less severe as their
  process_external and routes_updated seem to be idempotent - Verified
  against master via a makeshift functional test, I could not vouch for
  previous releases.

** Description changed:

  Reproduction:
  * Create a new router
  * Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).
  
  Any follow up update to the router (Add/remove interface, add/remove
  FIP) will fail non-idempotent operations on the external device. This is
  because any update will try to add the gateway again (Because
  self.ex_gw_port = None). Even without a specific failure, reconfiguring
  the external device is wasteful.
  
  HA routers in particular will fail by throwing
  VIPDuplicateAddressException for the external device's VIP. This
  behavior was actually changed in a recent Mitaka patch
  (https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
  so this affects Juno to Liberty but not master and future releases.
  
  The impact on legacy or distributed routers is less severe as their
  process_external and routes_updated seem to be idempotent - Verified
  against master via a makeshift functional test, I could not vouch for
  previous releases.
+ 
+ Severity: It's severe for HA routers from Juno to Liberty, but not as
+ much for other routes types or HA routers on master.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1523999

Title:
  Any error in L3 agent after external gateway is configured but before
  the local cache is updated results in errors in subsequent router
  updates

Status in neutron:
  New

Bug description:
  Reproduction:
  * Create a new router
  * Attach external interface - Execute external_gateway_added successfully but fail some time before self.ex_gw_port = self.get_ex_gw_port() (An example of a failure would be an RPC error when trying to update FIP statuses. In such a case extra routes would not be configured either, and post-router creation events would not be sent, which means that for example the metadata proxy wouldn't be started).

  Any follow up update to the router (Add/remove interface, add/remove
  FIP) will fail non-idempotent operations on the external device. This
  is because any update will try to add the gateway again (Because
  self.ex_gw_port = None). Even without a specific failure,
  reconfiguring the external device is wasteful.

  HA routers in particular will fail by throwing
  VIPDuplicateAddressException for the external device's VIP. This
  behavior was actually changed in a recent Mitaka patch
  (https://review.openstack.org/#/c/196893/50/neutron/agent/l3/ha_router.py),
  so this affects Juno to Liberty but not master and future releases.

  The impact on legacy or distributed routers is less severe as their
  process_external and routes_updated seem to be idempotent - Verified
  against master via a makeshift functional test, I could not vouch for
  previous releases.

  Severity: It's severe for HA routers from Juno to Liberty, but not as
  much for other routes types or HA routers on master.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1523999/+subscriptions


Follow ups