← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1648823] [NEW] l3 agent HA communication failure

 

Public bug reported:

An openstack environment was built using Openstack-Ansible (OSA) on
Mitaka with the neutron_l3_agent in HA mode. This was functioning
correctly using network namespaces for routers. Within the namespace
keeplived created an 'ha' virtual interface to track the status of the
other instance of the virtual router. This worked correctly, the 'ha'
virtual interface within 'master' router namespace could ping the 'ha'
virtual interface within the 'backup' router namespace, and when the
master went offline keepalived would successfully transition to master
and bring up the virtual IP addresses with then network namespace
virtual router.

We upgraded the environment to newton via the guide at
http://docs.openstack.org/developer/openstack-ansible/newton/upgrade-
guide/manual-upgrade.html. After this was done the network namespace
virtual routers (specifically the 'ha' track interfaaces) were no longer
able to communicate with each other, resulting in them both
transitioning to 'master' and bringing up duplicate IP addresses. This
caused intermittent connectivity to public floating IPs and also from
the routers to instances over VXLAN network.


******** l3_agent.ini configuration ********

# General
[DEFAULT]
verbose = True
debug = False

# While this option is deprecated in Liberty, if we remove it then it takes
# a default value of 'br-ex', which we do not want. We therefore leave it
# in place for now and can remove it in Mitaka.
external_network_bridge = 
gateway_external_network_id = 

use_namespaces = True
router_delete_namespaces = True

# Drivers
interface_driver = neutron.agent.linux.interface.BridgeInterfaceDriver

# Agent mode (legacy only)
agent_mode = legacy

# Conventional failover
allow_automatic_l3agent_failover = True

# HA failover
ha_confs_path = /var/lib/neutron/ha_confs
ha_vrrp_advert_int = 2
ha_vrrp_auth_password = bee916a2589b14dd7f
ha_vrrp_auth_type = PASS
handle_internal_only_routers = False
send_arp_for_ha = 3

# Metadata
enable_metadata_proxy = True


******** keepalived.conf configuration ********

vrrp_instance VR_1 {
    state BACKUP
    interface ha-42c56d27-10
    virtual_router_id 1
    priority 50
    garp_master_delay 60
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass bee916a2589b14dd7f
    }
    track_interface {
        ha-42c56d27-10
    }
    virtual_ipaddress {
        169.254.0.1/24 dev ha-42c56d27-10
    }
    virtual_ipaddress_excluded {
        10.0.0.1/8 dev qr-8deaf807-bb
        xx.xx.xx.xx/22 dev qg-6e4ebe51-94
        xx.xx.xx.xx/32 dev qg-6e4ebe51-94
        xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qg-6e4ebe51-94 scope link
        xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qr-8deaf807-bb scope link
    }
    virtual_routes {
        0.0.0.0/0 via xx.xx.xx.xx dev qg-6e4ebe51-94
    }
}

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1648823

Title:
  l3 agent HA communication failure

Status in neutron:
  New

Bug description:
  An openstack environment was built using Openstack-Ansible (OSA) on
  Mitaka with the neutron_l3_agent in HA mode. This was functioning
  correctly using network namespaces for routers. Within the namespace
  keeplived created an 'ha' virtual interface to track the status of the
  other instance of the virtual router. This worked correctly, the 'ha'
  virtual interface within 'master' router namespace could ping the 'ha'
  virtual interface within the 'backup' router namespace, and when the
  master went offline keepalived would successfully transition to master
  and bring up the virtual IP addresses with then network namespace
  virtual router.

  We upgraded the environment to newton via the guide at
  http://docs.openstack.org/developer/openstack-ansible/newton/upgrade-
  guide/manual-upgrade.html. After this was done the network namespace
  virtual routers (specifically the 'ha' track interfaaces) were no
  longer able to communicate with each other, resulting in them both
  transitioning to 'master' and bringing up duplicate IP addresses. This
  caused intermittent connectivity to public floating IPs and also from
  the routers to instances over VXLAN network.

  
  ******** l3_agent.ini configuration ********

  # General
  [DEFAULT]
  verbose = True
  debug = False

  # While this option is deprecated in Liberty, if we remove it then it takes
  # a default value of 'br-ex', which we do not want. We therefore leave it
  # in place for now and can remove it in Mitaka.
  external_network_bridge = 
  gateway_external_network_id = 

  use_namespaces = True
  router_delete_namespaces = True

  # Drivers
  interface_driver = neutron.agent.linux.interface.BridgeInterfaceDriver

  # Agent mode (legacy only)
  agent_mode = legacy

  # Conventional failover
  allow_automatic_l3agent_failover = True

  # HA failover
  ha_confs_path = /var/lib/neutron/ha_confs
  ha_vrrp_advert_int = 2
  ha_vrrp_auth_password = bee916a2589b14dd7f
  ha_vrrp_auth_type = PASS
  handle_internal_only_routers = False
  send_arp_for_ha = 3

  # Metadata
  enable_metadata_proxy = True

  
  ******** keepalived.conf configuration ********

  vrrp_instance VR_1 {
      state BACKUP
      interface ha-42c56d27-10
      virtual_router_id 1
      priority 50
      garp_master_delay 60
      nopreempt
      advert_int 2
      authentication {
          auth_type PASS
          auth_pass bee916a2589b14dd7f
      }
      track_interface {
          ha-42c56d27-10
      }
      virtual_ipaddress {
          169.254.0.1/24 dev ha-42c56d27-10
      }
      virtual_ipaddress_excluded {
          10.0.0.1/8 dev qr-8deaf807-bb
          xx.xx.xx.xx/22 dev qg-6e4ebe51-94
          xx.xx.xx.xx/32 dev qg-6e4ebe51-94
          xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qg-6e4ebe51-94 scope link
          xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qr-8deaf807-bb scope link
      }
      virtual_routes {
          0.0.0.0/0 via xx.xx.xx.xx dev qg-6e4ebe51-94
      }
  }

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1648823/+subscriptions


Follow ups