← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1641879] Re: No routers created after l3-agent start - error during L3NATAgentWithStateReport.periodic_sync_routers_task

 

Reviewed:  https://review.openstack.org/400233
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=62176a9b40619327aacee9ed4162252d1245d019
Submitter: Jenkins
Branch:    master

commit 62176a9b40619327aacee9ed4162252d1245d019
Author: Pepijn Oomen <pepijn@xxxxxxxxxxxxxxxxxx>
Date:   Mon Nov 21 12:07:45 2016 +0100

    Solve unexpected NoneType returned by _get_routers_can_schedule.
    
    Solve a problem with an unexpected NoneType returned by
    _get_routers_can_schedule called from within
    _schedule_ha_routers_to_additional_agent when using:
    
    router_scheduler_driver =
        neutron.scheduler.l3_agent_scheduler.AZLeastRoutersScheduler
    
    This was leading to problems with starting neutron-l3-agent on network
    nodes, causing HA routers to fail to start.
    
    Closes-Bug: #1641879
    Change-Id: I33c5a6214670f0ada9c2293b0eb2ff243f6f7b1b


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1641879

Title:
  No routers created after l3-agent start - error during
  L3NATAgentWithStateReport.periodic_sync_routers_task

Status in neutron:
  Fix Released

Bug description:
  After (re)starting the L3 agent, a failure during
  L3NATAgentWithStateReport.periodic_sync_routers_task was encountered.
  This resulted in a Python traceback being dumped to the log ending
  with "TypeError: 'NoneType' object is not iterable". The error repeats
  every minute or so. (See below.)

  While the error persists, no routers are provisioned by the L3 agent.
  It just keeps failing and failing and failing in an seemingly infinite
  loop.

  The fix/workaround was to remove a random router from the L3 agent
  having problems, like so:

  $ neutron l3-agent-router-remove <uuid-of-l3-agent-having-problems>
  <uuid-of-some-router>

  It did not seem to matter exactly which of the many routers scheduled
  to run on the problematic L3 agent that was removed in this way. The
  removal itself seemed to get rid of the blockage, allowing the L3
  agent to start normal operations shortly after.

  Excerpts from l3-agent.log:

  2016-11-15 03:14:26.201 2988 INFO neutron.common.config [-] Logging enabled!
  2016-11-15 03:14:26.201 2988 INFO neutron.common.config [-] /usr/bin/neutron-l3-agent version 8.3.0
  2016-11-15 03:14:26.504 2988 INFO eventlet.wsgi.server [-] (2988) wsgi starting up on http:/var/lib/neutron/keepalived-state-change
  2016-11-15 03:14:26.548 2988 INFO neutron.agent.l3.agent [-] Agent has just been revived. Doing a full sync.
  2016-11-15 03:14:26.781 2988 INFO neutron.agent.l3.agent [-] L3 agent started
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task [req-97b76ec8-9eb6-4e23-8074-63c889c3e199 - - - - -] Error during L3NATAgentWithStateReport.periodic_sync_routers_task
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task Traceback (most recent call last):
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     task(self, context)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 545, in periodic_sync_routers_task
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     self.fetch_and_sync_all_routers(context, ns_manager)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 556, in fetch_and_sync_all_routers
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     self.plugin_rpc.get_router_ids(context))
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 108, in get_router_ids
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     return cctxt.call(context, 'get_router_ids', host=self.host)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 136, in call
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     return self._original_context.call(ctxt, method, **kwargs)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 158, in call
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     retry=self.retry)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     timeout=timeout, retry=retry)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     retry=retry)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 461, in _send
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     raise result
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task TypeError: 'NoneType' object is not iterable
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task Traceback (most recent call last):
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     incoming.message))
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     return self._do_dispatch(endpoint, method, ctxt, args)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     result = func(ctxt, **new_args)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 75, in get_router_ids
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     router_ids=None)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/db/l3_agentschedulers_db.py", line 525, in auto_schedule_routers
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     self, context, host, router_ids)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/scheduler/l3_agent_scheduler.py", line 148, in auto_schedule_routers
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     plugin, context, l3_agent)
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/neutron/scheduler/l3_agent_scheduler.py", line 314, in _schedule_ha_routers_to_addition
  al_agent
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task     for router in schedulable_routers:
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task TypeError: 'NoneType' object is not iterable
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 
  2016-11-15 03:14:28.297 2988 ERROR oslo_service.periodic_task 

  [ some time pass ]

  2016-11-15 03:15:08.152 2988 ERROR oslo_service.periodic_task [req-97b76ec8-9eb6-4e23-8074-63c889c3e199 - - - - -] Error during L3NATAgentWithStateReport.periodic_sync_routers_task
  2016-11-15 03:15:08.152 2988 ERROR oslo_service.periodic_task Traceback (most recent call last):
  2016-11-15 03:15:08.152 2988 ERROR oslo_service.periodic_task   File "/usr/lib/python2.7/site-packages/oslo_service/periodic_task.py", line 220, in run_periodic_tasks
  2016-11-15 03:15:08.152 2988 ERROR oslo_service.periodic_task     task(self, context)
  [ ... ]

  [ "neutron l3-agent-router-remove <l3-agent-uuid> d95d9198-9960-4f2c-
  b33f-d6211ef1a8a3" is run]

  2016-11-15 03:15:50.441 2988 WARNING neutron.agent.l3.agent [-] Info for router d95d9198-9960-4f2c-b33f-d6211ef1a8a3 was not found. Performing router cleanup
  2016-11-15 03:17:23.102 2988 WARNING oslo.service.loopingcall [req-97b76ec8-9eb6-4e23-8074-63c889c3e199 - - - - -] Function 'neutron.service.Service.periodic_tasks' run outlasted interval by
   15.31 sec
  2016-11-15 03:17:23.174 2988 INFO oslo_rootwrap.client [-] Spawned new rootwrap daemon process with pid=9015
  2016-11-15 03:17:23.634 2988 INFO neutron.agent.l3.ha [-] Router 19258eff-2755-41b8-aefb-1c9b2af991f4 transitioned to backup
  [ ... ]

  At this point, the L3 agent starts operating normally, creating all
  the routers it's scheduled to run and so on.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1641879/+subscriptions


References