← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1755243] Re: AttributeError when updating DvrEdgeRouter objects running on network nodes

 

Reviewed:  https://review.openstack.org/552097
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8c2dae659a806fdc20331de4b8a917ec3ae0e6f6
Submitter: Zuul
Branch:    master

commit 8c2dae659a806fdc20331de4b8a917ec3ae0e6f6
Author: Daniel Gonzalez <daniel@xxxxxxxxxxxxxxxxxxxxx>
Date:   Mon Mar 12 17:48:54 2018 +0100

    Fix l3-agent crash on routers without ha_state
    
    l3-agent checks the HA state of routers when a router is updated.
    To ensure that the HA state is only checked on HA routers the following
    check is performed: `if router.get('ha') and not is_dvr_only_agent`.
    This check should ensure that the check is only performed on
    DvrEdgeHaRouter and HaRouter objects.
    
    Unfortunately, there are cases where we have DvrEdgeRouter objects
    running on 'dvr_snat' agents. E.g. when deploying a loadbalancer with
    neutron-lbaas in a landscape with 6 network nodes and
    max_l3_agents_per_router set to 3, it may happen that the loadbalancer
    is placed on a network node that does not have a DvrEdgeHaRouter running
    on it. In such a case, neutron will deploy a DvrEdgeRouter object on the
    network node to serve the loadbalancer, just like it would deploy a
    DvrEdgeRouter on a compute node when deploying a VM.
    
    Under such circumstances each update to the router will lead to an
    AttributeError, because the DvrEdgeRouter object does not have the
    ha_state attribute.
    
    This patch circumvents the issue by doing an additional check on the
    router object to ensure that it actually has the ha_state attribute.
    
    Change-Id: I755990324db445efd0ee0b8a9db1f4d7bfb58e26
    Closes-Bug: #1755243


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1755243

Title:
  AttributeError when updating DvrEdgeRouter objects running on network
  nodes

Status in neutron:
  Fix Released

Bug description:
  In a configuration with L3 HA, DVR and neutron-lbaasv2, it can happen
  that the update of a router with a connected load balancer crashes
  with the following stack trace (line numbers may be a bit outdated):

  Failed to process compatible router: 192c77b2-1487-4bc4-af40-26563e959989
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 543, in _process_router_update
      self._process_router_if_compatible(router)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 464, in _process_router_if_compatible
      self._process_updated_router(router)
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 480, in _process_updated_router
      router['id'], router.get(l3_constants.HA_ROUTER_STATE_KEY))
    File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha.py", line 132, in check_ha_state_for_router
      if ri and current_state != TRANSLATION_MAP[ri.ha_state]:
  AttributeError: 'DvrEdgeRouter' object has no attribute 'ha_state'

  The issue is, that in a landscape with more network nodes than
  'max_l3_agents_per_router', e.g. 6 network nodes and
  max_l3_agents_per_router = 3, it may happen that a load balancer is
  scheduled on a network node that does not have the correct router
  deployed on it. In such a case, neutron deploys a DvrEdgeRouter on the
  network node to serve the LB. Every time neutron updates that router,
  e.g. to assign a floating IP to the LB, it crashes with the above
  stack trace because it expected to find a DvrEdgeHaRouter on the
  network node on which it has to check the ha state.

  To verify if it has to check the ha state of a router object, neutron
  runs the following check:

  if router.get('ha') and not is_dvr_only_agent

  In our case that check is true, because the agent runs in mode
  'dvr_snat', and the router is HA. But the actual router object running
  on the network node is of type DvrEdgeRouter and therefore has no
  ha_state attribute, causing the update to fail.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1755243/+subscriptions


References