← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1722584] Re: [SRU] Return traffic from metadata service may get dropped by hypervisor due to wrong checksum

 

This bug was fixed in the package neutron - 2:14.0.2-0ubuntu1

---------------
neutron (2:14.0.2-0ubuntu1) disco; urgency=medium

  * New upstream release for OpenStack Stein (LP: #1831754).
  * d/p/bug1826419.patch: Dropped. Fixed upstream in 14.0.2.
  * d/p/revert-iptables-tcp-checksum-fill-code.patch: Dropped. Fixed
    upstream in 14.0.2.

neutron (2:14.0.1-0ubuntu2) disco; urgency=medium

  * d/p/revert-iptables-tcp-checksum-fill-code.patch: Cherry-picked
    from upstream to revert invalid use of iptables -j CHECKSUM
    (LP: #1722584).

 -- Sahid Orentino Ferdjaoui <sahid.ferdjaoui@xxxxxxxxxxxxx>  Wed, 03
Jul 2019 16:22:58 +0200

** Changed in: neutron (Ubuntu Disco)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1722584

Title:
  [SRU] Return traffic from metadata service may get dropped by
  hypervisor due to wrong checksum

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in Ubuntu Cloud Archive stein series:
  Fix Committed
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Committed
Status in neutron source package in Cosmic:
  Fix Committed
Status in neutron source package in Disco:
  Fix Released
Status in neutron source package in Eoan:
  Fix Released

Bug description:
  [Impact]
  Prior addition of code to add checksum rules was found to cause problems with newer kernels. Patch subsequently reverted so this request is to backport those patches to the ubuntu archives.

  [Test Case]
  * deploy openstack (>= queens)
  * create router/network/instance (dvr=false,l3ha=false)
  * go to router ns on neutron-gateway and check that the following returns nothing
  sudo ip netns exec qrouter-<id> iptables -t mangle -S| grep '\--sport 9697 -j CHECKSUM --checksum-fill'

  [Regression Potential]
  Backporting the revert patch will mean that routers created with this patch will no longer have a checksum rule added for metadata tcp packets. The original patch added a rule that turned out not to be the fix for the root issue and was subsequently found to cause problems with kernels < 4.19 since it was never intended for gso tcp packets to have their checksum verified using this type of rule. So, removal of this rule (by addition of the revert patch) is not intended to change behaviour at all. The only potential side-effect is that rules that were already created will not be cleaned up (until node reboot or router recreate) and in an L3HA config you could end up with some router instances having the rule and some not depending on whether they were created before or after the patch was included.

  [Other Info]
  This revert patch does not remove rules added by the original patch so manual cleanup of those old rules is required.

  -----------------------------------------------------------------------------
  We have a problem with the metadata service not being responsive, when the proxied in the router namespace on some of our networking nodes after upgrading to Ocata (Running on CentOS 7.4, with the RDO packages).

  Instance routes traffic to 169.254.169.254 to it's default gateway.
  Default gateway is an OpenStack router in a namespace on a networking node.

  - Traffic gets sent from the guest,
  - to the router,
  - iptables routes it to the metadata proxy service,
  - response packet gets routed back, leaving the namespace
  - Hypervisor gets the packet in
  - Checksum of packet is wrong, and the packet gets dropped before putting it on the bridge

  Based on the following bug https://bugs.launchpad.net/openstack-
  ansible/+bug/1483603, we found that adding the following iptable rule
  in the router namespace made this work again: 'iptables -t mangle -I
  POSTROUTING -p tcp --sport 9697 -j CHECKSUM --checksum-fill'

  (NOTE: The rule from the 1st comment to the bug did solve access to
  the metadata service, but the lack of precision introduced other
  problems with the network)

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1722584/+subscriptions


References