← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1744062] [NEW] L3 HA: multiple agents are active at the same time

 

Public bug reported:

This is the same issue reported in
https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked
as 'Fix Released' and the issue is still occurring and I can't change
back to 'New' so it seems best to just open a new bug.

It seems as if this bug surfaces due to load issues. While the fix
provided by Venkata (https://review.openstack.org/#/c/522641/) should
help clean things up at the time of l3 agent restart, issues seem to
come back later down the line in some circumstances. xavpaice mentioned
he saw multiple routers active at the same time when they had 464
routers configured on 3 neutron gateway hosts using L3HA, and each
router was scheduled to all 3 hosts. However, jhebden mentions that
things seem stable at the 400 L3HA router mark, and it's worth noting
this is the same deployment that xavpaice was referring to.

It seems to me that something is being pushed to it's limit, and
possibly once that limit is hit, master router advertisements aren't
being received, causing a new master to be elected. If this is the case
it would be great to get to the bottom of what resource is getting
constrained.

** Affects: cloud-archive
     Importance: High
         Status: Triaged

** Affects: cloud-archive/mitaka
     Importance: High
         Status: Triaged

** Affects: cloud-archive/newton
     Importance: High
         Status: Triaged

** Affects: cloud-archive/ocata
     Importance: High
         Status: Triaged

** Affects: cloud-archive/pike
     Importance: High
         Status: Triaged

** Affects: cloud-archive/queens
     Importance: High
         Status: Triaged

** Affects: neutron
     Importance: Undecided
         Status: New

** Affects: neutron (Ubuntu)
     Importance: High
         Status: Triaged

** Affects: neutron (Ubuntu Xenial)
     Importance: High
         Status: Triaged

** Affects: neutron (Ubuntu Artful)
     Importance: High
         Status: Triaged

** Affects: neutron (Ubuntu Bionic)
     Importance: High
         Status: Triaged

** Description changed:

  This is the same issue as
  https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix
  Released' and the issue is still occurring. There are a lot of details
- in the linked bug so I won't add them here unless it's useful.
+ in the linked bug so I won't add too many here.
+ 
+ It seems as if this bug surfaces due to load issues. While the fix
+ provided by Venkata (https://review.openstack.org/#/c/522641/) should
+ help clean things up at the time of l3 agent restart, issues seem to
+ come back later down the line in some circumstances. xavpaice mentioned
+ he saw multiple routers active at the same time when they had 464
+ routers configured on 3 neutron gateway hosts using L3HA, and each
+ router was scheduled to all 3 hosts. However, jhebden mentions that
+ things seem stable at the 400 L3HA router mark, and it's worth noting
+ this is the same deployment that xavpaice was referring to.
+ 
+ It seems to me that something is being pushed to it's limit, and
+ possibly once that limit is hit, master router advertisements aren't
+ being received, causing a new master to be elected. If this is the case
+ it would be great to get to the bottom of what resource is getting
+ constrained.

** Also affects: neutron (Ubuntu)
   Importance: Undecided
       Status: New

** Description changed:

- This is the same issue as
- https://bugs.launchpad.net/neutron/+bug/1731595 however that bug is 'Fix
- Released' and the issue is still occurring. There are a lot of details
- in the linked bug so I won't add too many here.
- 
- It seems as if this bug surfaces due to load issues. While the fix
- provided by Venkata (https://review.openstack.org/#/c/522641/) should
- help clean things up at the time of l3 agent restart, issues seem to
- come back later down the line in some circumstances. xavpaice mentioned
- he saw multiple routers active at the same time when they had 464
- routers configured on 3 neutron gateway hosts using L3HA, and each
- router was scheduled to all 3 hosts. However, jhebden mentions that
- things seem stable at the 400 L3HA router mark, and it's worth noting
- this is the same deployment that xavpaice was referring to.
- 
- It seems to me that something is being pushed to it's limit, and
- possibly once that limit is hit, master router advertisements aren't
- being received, causing a new master to be elected. If this is the case
- it would be great to get to the bottom of what resource is getting
- constrained.
+ -

** No longer affects: neutron

** Summary changed:

-  L3 HA: multiple agents are active at the same time
+ -

** Changed in: neutron (Ubuntu)
       Status: New => Incomplete

** Summary changed:

- -
+ L3 HA: multiple agents are active at the same time

** Description changed:

- -
+ This is the same issue reported in
+ https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked
+ as 'Fix Released' and the issue is still occurring.
+ 
+ It seems as if this bug surfaces due to load issues. While the fix
+ provided by Venkata (https://review.openstack.org/#/c/522641/) should
+ help clean things up at the time of l3 agent restart, issues seem to
+ come back later down the line in some circumstances. xavpaice mentioned
+ he saw multiple routers active at the same time when they had 464
+ routers configured on 3 neutron gateway hosts using L3HA, and each
+ router was scheduled to all 3 hosts. However, jhebden mentions that
+ things seem stable at the 400 L3HA router mark, and it's worth noting
+ this is the same deployment that xavpaice was referring to.
+ 
+ It seems to me that something is being pushed to it's limit, and
+ possibly once that limit is hit, master router advertisements aren't
+ being received, causing a new master to be elected. If this is the case
+ it would be great to get to the bottom of what resource is getting
+ constrained.

** Changed in: neutron (Ubuntu)
       Status: Incomplete => Triaged

** Changed in: neutron (Ubuntu)
   Importance: Undecided => High

** Also affects: neutron
   Importance: Undecided
       Status: New

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/queens
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/ocata
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/pike
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/mitaka
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/newton
   Importance: Undecided
       Status: New

** Also affects: neutron (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: neutron (Ubuntu Bionic)
   Importance: High
       Status: Triaged

** Also affects: neutron (Ubuntu Artful)
   Importance: Undecided
       Status: New

** Changed in: cloud-archive/mitaka
   Importance: Undecided => High

** Changed in: cloud-archive/mitaka
       Status: New => Triaged

** Changed in: cloud-archive/newton
   Importance: Undecided => High

** Changed in: cloud-archive/newton
       Status: New => Triaged

** Changed in: cloud-archive/ocata
   Importance: Undecided => High

** Changed in: cloud-archive/ocata
       Status: New => Triaged

** Changed in: cloud-archive/pike
   Importance: Undecided => High

** Changed in: cloud-archive/pike
       Status: New => Triaged

** Changed in: cloud-archive/queens
   Importance: Undecided => High

** Changed in: cloud-archive/queens
       Status: New => Triaged

** Changed in: neutron (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: neutron (Ubuntu Xenial)
       Status: New => Triaged

** Changed in: neutron (Ubuntu Artful)
   Importance: Undecided => High

** Changed in: neutron (Ubuntu Artful)
       Status: New => Triaged

** Description changed:

  This is the same issue reported in
  https://bugs.launchpad.net/neutron/+bug/1731595, however that is marked
- as 'Fix Released' and the issue is still occurring.
+ as 'Fix Released' and the issue is still occurring and I can't change
+ back to 'New' so it seems best to just open a new bug.
  
  It seems as if this bug surfaces due to load issues. While the fix
  provided by Venkata (https://review.openstack.org/#/c/522641/) should
  help clean things up at the time of l3 agent restart, issues seem to
  come back later down the line in some circumstances. xavpaice mentioned
  he saw multiple routers active at the same time when they had 464
  routers configured on 3 neutron gateway hosts using L3HA, and each
  router was scheduled to all 3 hosts. However, jhebden mentions that
  things seem stable at the 400 L3HA router mark, and it's worth noting
  this is the same deployment that xavpaice was referring to.
  
  It seems to me that something is being pushed to it's limit, and
  possibly once that limit is hit, master router advertisements aren't
  being received, causing a new master to be elected. If this is the case
  it would be great to get to the bottom of what resource is getting
  constrained.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1744062

Title:
  L3 HA: multiple agents are active at the same time

Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu Cloud Archive mitaka series:
  Triaged
Status in Ubuntu Cloud Archive newton series:
  Triaged
Status in Ubuntu Cloud Archive ocata series:
  Triaged
Status in Ubuntu Cloud Archive pike series:
  Triaged
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in neutron:
  New
Status in neutron package in Ubuntu:
  Triaged
Status in neutron source package in Xenial:
  Triaged
Status in neutron source package in Artful:
  Triaged
Status in neutron source package in Bionic:
  Triaged

Bug description:
  This is the same issue reported in
  https://bugs.launchpad.net/neutron/+bug/1731595, however that is
  marked as 'Fix Released' and the issue is still occurring and I can't
  change back to 'New' so it seems best to just open a new bug.

  It seems as if this bug surfaces due to load issues. While the fix
  provided by Venkata (https://review.openstack.org/#/c/522641/) should
  help clean things up at the time of l3 agent restart, issues seem to
  come back later down the line in some circumstances. xavpaice
  mentioned he saw multiple routers active at the same time when they
  had 464 routers configured on 3 neutron gateway hosts using L3HA, and
  each router was scheduled to all 3 hosts. However, jhebden mentions
  that things seem stable at the 400 L3HA router mark, and it's worth
  noting this is the same deployment that xavpaice was referring to.

  It seems to me that something is being pushed to it's limit, and
  possibly once that limit is hit, master router advertisements aren't
  being received, causing a new master to be elected. If this is the
  case it would be great to get to the bottom of what resource is
  getting constrained.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744062/+subscriptions


Follow ups