yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #25722
[Bug 1403860] [NEW] L3 HA routers have IPv6 link local address on devices, periodically send traffic, moving MACs around and disrupting traffic
Public bug reported:
In the HA routers case we place the same Neutron port on all HA router
instances. This means that they share the same MAC and IP addresses. We
configure all IP addresses in keepalived.conf so that keepalived takes
care to move the IP addresses, and configure them only on the master
instance. The MAC address, however, is present on all HA router devices
on all network nodes, and so is the IPv6 link local address that is
generated from that MAC address. This means that we have an active
(IPv6) address in multiple places in the network. Any traffic generated
from said address on a standby node will change the MAC tables of the
underlay network, causing it to think that the MAC address has moved
from the master instance to any of the standbys. This causes network
disruption.
Severity / reproduction:
Create an HA router on a setup with 3 network nodes. The HA router is created on all nodes. Connect it to an internal and external network. Create an instance and configure it with a floating IP. Ping the floating IP: Every two minutes, we've observed the standby nodes sending an ICMPv6 multicast listener report. The MAC address of the external interface of the master router will now move (From the perspective of the underlay), causing traffic to not reach the correct (Master) node. 30 seconds later (Of 100% packet loss) the client will re-issue an ARP request for the IPv4 address, which the master will answer, moving the MAC back and fixing the issue. This repeats every 2 minutes, with 30 seconds of packet loss, resulting in 75% up-time.
Solutions:
The sledgehammer solution would be to shut down all NICs on standby routers and open them on the master instance using the keepalived notifier scripts. In the spirit of keeping these scripts as lightweight as possible, I'd like to solve this issue instead by handling the IPv6 link local address like we do with IPv4 addresses: Not configuring them on the device, but adding them as a VIP to keepalived.conf and let keepalived configure the address on the master node only.
** Affects: neutron
Importance: Undecided
Assignee: Assaf Muller (amuller)
Status: New
** Tags: juno-backport-potential l3-ha
** Changed in: neutron
Assignee: (unassigned) => Assaf Muller (amuller)
** Description changed:
In the HA routers case we place the same Neutron port on all HA router
instances. This means that they share the same MAC and IP addresses. We
configure all IP addresses in keepalived.conf so that keepalived takes
care to move the IP addresses, and configure them only on the master
instance. The MAC address, however, is present on all HA router devices
on all network nodes, and so is the IPv6 link local address that is
generated from that MAC address. This means that we have an active
(IPv6) address in multiple places in the network. Any traffic generated
from said address on a standby node will change the MAC tables of the
underlay network, causing it to think that the MAC address has moved
from the master instance to any of the standbys. This causes network
disruption.
Severity / reproduction:
Create an HA router on a setup with 3 network nodes. The HA router is created on all nodes. Connect it to an internal and external network. Create an instance and configure it with a floating IP. Ping the floating IP: Every two minutes, we've observed the standby nodes sending an ICMPv6 multicast listener report. The MAC address of the external interface of the master router will now move (From the perspective of the underlay), causing traffic to not reach the correct (Master) node. 30 seconds later (Of 100% packet loss) the client will re-issue an ARP request for the IPv4 address, which the master will answer, moving the MAC back and fixing the issue. This repeats every 2 minutes, with 30 seconds of packet loss, resulting in 75% up-time.
- The sledgehammer solution would be to shut down all NICs on standby
- routers and open them on the master instance using the keepalived
- notifier scripts. In the spirit of keeping these scripts as lightweight
- as possible, I'd like to solve this issue instead by handling the IPv6
- link local address like we do with IPv4 addresses: Not configuring them
- on the device, but adding them as a VIP to keepalived.conf and let
- keepalived configure the address on the master node only.
+ Solutions:
+ The sledgehammer solution would be to shut down all NICs on standby routers and open them on the master instance using the keepalived notifier scripts. In the spirit of keeping these scripts as lightweight as possible, I'd like to solve this issue instead by handling the IPv6 link local address like we do with IPv4 addresses: Not configuring them on the device, but adding them as a VIP to keepalived.conf and let keepalived configure the address on the master node only.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1403860
Title:
L3 HA routers have IPv6 link local address on devices, periodically
send traffic, moving MACs around and disrupting traffic
Status in OpenStack Neutron (virtual network service):
New
Bug description:
In the HA routers case we place the same Neutron port on all HA router
instances. This means that they share the same MAC and IP addresses.
We configure all IP addresses in keepalived.conf so that keepalived
takes care to move the IP addresses, and configure them only on the
master instance. The MAC address, however, is present on all HA router
devices on all network nodes, and so is the IPv6 link local address
that is generated from that MAC address. This means that we have an
active (IPv6) address in multiple places in the network. Any traffic
generated from said address on a standby node will change the MAC
tables of the underlay network, causing it to think that the MAC
address has moved from the master instance to any of the standbys.
This causes network disruption.
Severity / reproduction:
Create an HA router on a setup with 3 network nodes. The HA router is created on all nodes. Connect it to an internal and external network. Create an instance and configure it with a floating IP. Ping the floating IP: Every two minutes, we've observed the standby nodes sending an ICMPv6 multicast listener report. The MAC address of the external interface of the master router will now move (From the perspective of the underlay), causing traffic to not reach the correct (Master) node. 30 seconds later (Of 100% packet loss) the client will re-issue an ARP request for the IPv4 address, which the master will answer, moving the MAC back and fixing the issue. This repeats every 2 minutes, with 30 seconds of packet loss, resulting in 75% up-time.
Solutions:
The sledgehammer solution would be to shut down all NICs on standby routers and open them on the master instance using the keepalived notifier scripts. In the spirit of keeping these scripts as lightweight as possible, I'd like to solve this issue instead by handling the IPv6 link local address like we do with IPv4 addresses: Not configuring them on the device, but adding them as a VIP to keepalived.conf and let keepalived configure the address on the master node only.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1403860/+subscriptions
Follow ups
-
[Bug 1403860] Re: L3 HA routers have IPv6 link local address on devices, periodically send traffic, moving MACs around and disrupting traffic
From: Adam Gandelman, 2015-04-10
-
[Bug 1403860] Re: L3 HA routers have IPv6 link local address on devices, periodically send traffic, moving MACs around and disrupting traffic
From: Adam Gandelman, 2015-04-09
-
[Bug 1403860] Re: L3 HA routers have IPv6 link local address on devices, periodically send traffic, moving MACs around and disrupting traffic
From: Thierry Carrez, 2015-02-05
-
[Bug 1403860] [NEW] L3 HA routers have IPv6 link local address on devices, periodically send traffic, moving MACs around and disrupting traffic
From: Assaf Muller, 2014-12-18
References