yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #59646
[Bug 1648823] [NEW] l3 agent HA communication failure
Public bug reported:
An openstack environment was built using Openstack-Ansible (OSA) on
Mitaka with the neutron_l3_agent in HA mode. This was functioning
correctly using network namespaces for routers. Within the namespace
keeplived created an 'ha' virtual interface to track the status of the
other instance of the virtual router. This worked correctly, the 'ha'
virtual interface within 'master' router namespace could ping the 'ha'
virtual interface within the 'backup' router namespace, and when the
master went offline keepalived would successfully transition to master
and bring up the virtual IP addresses with then network namespace
virtual router.
We upgraded the environment to newton via the guide at
http://docs.openstack.org/developer/openstack-ansible/newton/upgrade-
guide/manual-upgrade.html. After this was done the network namespace
virtual routers (specifically the 'ha' track interfaaces) were no longer
able to communicate with each other, resulting in them both
transitioning to 'master' and bringing up duplicate IP addresses. This
caused intermittent connectivity to public floating IPs and also from
the routers to instances over VXLAN network.
******** l3_agent.ini configuration ********
# General
[DEFAULT]
verbose = True
debug = False
# While this option is deprecated in Liberty, if we remove it then it takes
# a default value of 'br-ex', which we do not want. We therefore leave it
# in place for now and can remove it in Mitaka.
external_network_bridge =
gateway_external_network_id =
use_namespaces = True
router_delete_namespaces = True
# Drivers
interface_driver = neutron.agent.linux.interface.BridgeInterfaceDriver
# Agent mode (legacy only)
agent_mode = legacy
# Conventional failover
allow_automatic_l3agent_failover = True
# HA failover
ha_confs_path = /var/lib/neutron/ha_confs
ha_vrrp_advert_int = 2
ha_vrrp_auth_password = bee916a2589b14dd7f
ha_vrrp_auth_type = PASS
handle_internal_only_routers = False
send_arp_for_ha = 3
# Metadata
enable_metadata_proxy = True
******** keepalived.conf configuration ********
vrrp_instance VR_1 {
state BACKUP
interface ha-42c56d27-10
virtual_router_id 1
priority 50
garp_master_delay 60
nopreempt
advert_int 2
authentication {
auth_type PASS
auth_pass bee916a2589b14dd7f
}
track_interface {
ha-42c56d27-10
}
virtual_ipaddress {
169.254.0.1/24 dev ha-42c56d27-10
}
virtual_ipaddress_excluded {
10.0.0.1/8 dev qr-8deaf807-bb
xx.xx.xx.xx/22 dev qg-6e4ebe51-94
xx.xx.xx.xx/32 dev qg-6e4ebe51-94
xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qg-6e4ebe51-94 scope link
xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qr-8deaf807-bb scope link
}
virtual_routes {
0.0.0.0/0 via xx.xx.xx.xx dev qg-6e4ebe51-94
}
}
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1648823
Title:
l3 agent HA communication failure
Status in neutron:
New
Bug description:
An openstack environment was built using Openstack-Ansible (OSA) on
Mitaka with the neutron_l3_agent in HA mode. This was functioning
correctly using network namespaces for routers. Within the namespace
keeplived created an 'ha' virtual interface to track the status of the
other instance of the virtual router. This worked correctly, the 'ha'
virtual interface within 'master' router namespace could ping the 'ha'
virtual interface within the 'backup' router namespace, and when the
master went offline keepalived would successfully transition to master
and bring up the virtual IP addresses with then network namespace
virtual router.
We upgraded the environment to newton via the guide at
http://docs.openstack.org/developer/openstack-ansible/newton/upgrade-
guide/manual-upgrade.html. After this was done the network namespace
virtual routers (specifically the 'ha' track interfaaces) were no
longer able to communicate with each other, resulting in them both
transitioning to 'master' and bringing up duplicate IP addresses. This
caused intermittent connectivity to public floating IPs and also from
the routers to instances over VXLAN network.
******** l3_agent.ini configuration ********
# General
[DEFAULT]
verbose = True
debug = False
# While this option is deprecated in Liberty, if we remove it then it takes
# a default value of 'br-ex', which we do not want. We therefore leave it
# in place for now and can remove it in Mitaka.
external_network_bridge =
gateway_external_network_id =
use_namespaces = True
router_delete_namespaces = True
# Drivers
interface_driver = neutron.agent.linux.interface.BridgeInterfaceDriver
# Agent mode (legacy only)
agent_mode = legacy
# Conventional failover
allow_automatic_l3agent_failover = True
# HA failover
ha_confs_path = /var/lib/neutron/ha_confs
ha_vrrp_advert_int = 2
ha_vrrp_auth_password = bee916a2589b14dd7f
ha_vrrp_auth_type = PASS
handle_internal_only_routers = False
send_arp_for_ha = 3
# Metadata
enable_metadata_proxy = True
******** keepalived.conf configuration ********
vrrp_instance VR_1 {
state BACKUP
interface ha-42c56d27-10
virtual_router_id 1
priority 50
garp_master_delay 60
nopreempt
advert_int 2
authentication {
auth_type PASS
auth_pass bee916a2589b14dd7f
}
track_interface {
ha-42c56d27-10
}
virtual_ipaddress {
169.254.0.1/24 dev ha-42c56d27-10
}
virtual_ipaddress_excluded {
10.0.0.1/8 dev qr-8deaf807-bb
xx.xx.xx.xx/22 dev qg-6e4ebe51-94
xx.xx.xx.xx/32 dev qg-6e4ebe51-94
xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qg-6e4ebe51-94 scope link
xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qr-8deaf807-bb scope link
}
virtual_routes {
0.0.0.0/0 via xx.xx.xx.xx dev qg-6e4ebe51-94
}
}
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1648823/+subscriptions
Follow ups