← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1946187] [NEW] HA routers not going to be "primary" at all

 

Public bug reported:

It happens in the CI from time to time that many tests are failing
because router is in backup state all the time and it's never
transitioned to be primary on the node.

Examples of the failure:
https://3142cc95d58eb8a4ee07-043369ac575bbfe29758366f4ba498a1.ssl.cf1.rackcdn.com/765072/8/check/neutron-tempest-plugin-scenario-openvswitch/499b47d/controller/logs/screen-q-l3.txt

https://6599da62140c9583e14a-cd7f53ffbb0b86c69deae453da021fe8.ssl.cf5.rackcdn.com/811746/4/check/neutron-
tempest-plugin-scenario-openvswitch/3cafcd7/testr_results.html

https://zuul.opendev.org/t/openstack/build/75c056464b6f445ebde18c1b07f5bcce


Example of stacktrace:

Traceback (most recent call last):
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py", line 80, in wait_until_true
    eventlet.sleep(sleep)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
    return self.greenlet.switch()
eventlet.timeout.Timeout: 600 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_basic.py", line 35, in test_basic_instance
    self.setup_network_and_server()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 281, in setup_network_and_server
    router = self.create_router_by_client(**kwargs)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 209, in create_router_by_client
    cls._wait_for_router_ha_active(router['id'])
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 228, in _wait_for_router_ha_active
    utils.wait_until_true(_router_active_on_l3_agent,
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py", line 84, in wait_until_true
    raise exception
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Router 1c4ce297-5a04-4794-9720-20fdec9ca4e5 is not active on any of the L3 agents

** Affects: neutron
     Importance: High
         Status: Confirmed


** Tags: gate-failure l3-ha

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1946187

Title:
  HA routers not going to be "primary" at all

Status in neutron:
  Confirmed

Bug description:
  It happens in the CI from time to time that many tests are failing
  because router is in backup state all the time and it's never
  transitioned to be primary on the node.

  Examples of the failure:
  https://3142cc95d58eb8a4ee07-043369ac575bbfe29758366f4ba498a1.ssl.cf1.rackcdn.com/765072/8/check/neutron-tempest-plugin-scenario-openvswitch/499b47d/controller/logs/screen-q-l3.txt

  https://6599da62140c9583e14a-cd7f53ffbb0b86c69deae453da021fe8.ssl.cf5.rackcdn.com/811746/4/check/neutron-
  tempest-plugin-scenario-openvswitch/3cafcd7/testr_results.html

  https://zuul.opendev.org/t/openstack/build/75c056464b6f445ebde18c1b07f5bcce

  
  Example of stacktrace:

  Traceback (most recent call last):
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py", line 80, in wait_until_true
      eventlet.sleep(sleep)
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep
      hub.switch()
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
      return self.greenlet.switch()
  eventlet.timeout.Timeout: 600 seconds

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_basic.py", line 35, in test_basic_instance
      self.setup_network_and_server()
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 281, in setup_network_and_server
      router = self.create_router_by_client(**kwargs)
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 209, in create_router_by_client
      cls._wait_for_router_ha_active(router['id'])
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 228, in _wait_for_router_ha_active
      utils.wait_until_true(_router_active_on_l3_agent,
    File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py", line 84, in wait_until_true
      raise exception
  tempest.lib.exceptions.TimeoutException: Request timed out
  Details: Router 1c4ce297-5a04-4794-9720-20fdec9ca4e5 is not active on any of the L3 agents

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1946187/+subscriptions



Follow ups