yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1667756] Re: Backup HA router sending traffic, traffic from switch interrupted

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Corey Bryant <corey.bryant@xxxxxxxxxxxxx>
Date: Mon, 06 Aug 2018 14:01:51 -0000
Reply-to: Bug 1667756 <1667756@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Also affects: neutron (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: neutron (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: neutron (Ubuntu Xenial)
       Status: New => Triaged

** Changed in: neutron (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: neutron (Ubuntu)
       Status: New => Invalid

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/mitaka
   Importance: Undecided
       Status: New

** Changed in: cloud-archive/mitaka
       Status: New => Triaged

** Changed in: cloud-archive/mitaka
   Importance: Undecided => High

** Changed in: cloud-archive
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1667756

Title:
  Backup HA router sending traffic, traffic from switch interrupted

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive mitaka series:
  Triaged
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Invalid
Status in neutron source package in Xenial:
  Triaged

Bug description:
  As outlined in https://review.openstack.org/#/c/142843/, backup HA
  routers should not send any traffic.  Any traffic will cause the
  connected switch to learn a new port for the associated src mac
  address since the mac address will be in use on the primary HA router.

  We are observing backup routers sending IPv6 RA and RS messages
  probably in response to incoming IPv6 RA messages.  The subnets
  associated with the HA routers are not intended for IPv6 traffic.

  A typical traffic sequence is:

  Packet from external switch...
  08:81:f4:a6:dc:01 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 110: (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80:52:0:136c::fe > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56

  Immediately followed by a packet from the backup HA router...
  fa:16:3e:a7:ae:63 > 33:33:ff:a7:ae:63, ethertype IPv6 (0x86dd), length 86: (hlim 1, next-header Options (0) payload length: 32) :: > ff02::1:ffa7:ae63: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ffa7:ae63

  Another pkt...
  fa:16:3e:a7:ae:63 > 33:33:ff:a7:ae:63, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) :: > ff02::1:ffa7:ae63: [icmp6 sum ok] ICMP6, neighbor solicitation, length 24, who has 2620:52:0:136c:f816:3eff:fea7:ae63

  Another Pkt...
  fa:16:3e:a7:ae:63 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) 

  At this point, the switch has updated its mac table and traffic to the
  fa:16:3e:a7:ae:63 address has been redirected to the backup host.
  SSH/ping traffic resumes at a later time when the primary router node
  sends traffic with the fa:16:3e:a7:ae:63 source address.

  This problem is reproducible in our environment as follows:

  1. Deploy OSP10
  2. Create external network
  3. Create external subnet (IPv4)
  4. Create an internal network and VM
  5. Attach floating ip
  6. ssh into the VM through the FIP or ping the FIP
  7. you will start to see ssh freeze or the ping fail occasionally

  
  Additional info:
  Setting accept_ra=0 on the backup host routers stops the problem from
  happening.  Unfortunately, on a reboot, we loose the setting.  The current
  sysctl files have accept_ra=0.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1667756/+subscriptions

References

[Bug 1667756] [NEW] Backup HA router sending traffic, traffic from switch interrupted
From: Aaron Smith, 2017-02-24