← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1722584] Re: Return traffic from metadata service may get dropped by hypervisor due to wrong checksum

 

** Description changed:

- We have a problem with the metadata service not being responsive, when
- the proxied in the router namespace on some of our networking nodes
- after upgrading to Ocata (Running on CentOS 7.4, with the RDO packages).
+ [Impact]
+ Prior addition of code to add checksum rules was found to cause problems with newer kernels. Patch subsequently reverted so this request is to backport those patches to the ubuntu archives.
  
+ [Test Case]
+ * deploy openstack (>= queens)
+ * create router/network/instance (dvr=false,l3ha=false)
+ * go to router ns on neutron-gateway and check that the following returns nothing
+ sudo ip netns exec qrouter-<id> iptables -t mangle -S| grep '--sport 9697 -j CHECKSUM --checksum-fill'
+ 
+ [Regression Potential]
+ None expected
+ 
+ [Other Info]
+ This revert patch does not remove rules added by the original patch so manual cleanup of those old rules is required.
+ 
+ -----------------------------------------------------------------------------
+ We have a problem with the metadata service not being responsive, when the proxied in the router namespace on some of our networking nodes after upgrading to Ocata (Running on CentOS 7.4, with the RDO packages).
  
  Instance routes traffic to 169.254.169.254 to it's default gateway.
  Default gateway is an OpenStack router in a namespace on a networking node.
  
  - Traffic gets sent from the guest,
  - to the router,
  - iptables routes it to the metadata proxy service,
  - response packet gets routed back, leaving the namespace
  - Hypervisor gets the packet in
  - Checksum of packet is wrong, and the packet gets dropped before putting it on the bridge
  
- 
- Based on the following bug https://bugs.launchpad.net/openstack-ansible/+bug/1483603, we found that adding the following iptable rule in the router namespace made this work again: 'iptables -t mangle -I POSTROUTING -p tcp --sport 9697 -j CHECKSUM --checksum-fill'
+ Based on the following bug https://bugs.launchpad.net/openstack-
+ ansible/+bug/1483603, we found that adding the following iptable rule in
+ the router namespace made this work again: 'iptables -t mangle -I
+ POSTROUTING -p tcp --sport 9697 -j CHECKSUM --checksum-fill'
  
  (NOTE: The rule from the 1st comment to the bug did solve access to the
  metadata service, but the lack of precision introduced other problems
  with the network)

** Description changed:

  [Impact]
  Prior addition of code to add checksum rules was found to cause problems with newer kernels. Patch subsequently reverted so this request is to backport those patches to the ubuntu archives.
  
  [Test Case]
  * deploy openstack (>= queens)
  * create router/network/instance (dvr=false,l3ha=false)
  * go to router ns on neutron-gateway and check that the following returns nothing
- sudo ip netns exec qrouter-<id> iptables -t mangle -S| grep '--sport 9697 -j CHECKSUM --checksum-fill'
+ sudo ip netns exec qrouter-<id> iptables -t mangle -S| grep '\--sport 9697 -j CHECKSUM --checksum-fill'
  
  [Regression Potential]
  None expected
  
  [Other Info]
  This revert patch does not remove rules added by the original patch so manual cleanup of those old rules is required.
  
  -----------------------------------------------------------------------------
  We have a problem with the metadata service not being responsive, when the proxied in the router namespace on some of our networking nodes after upgrading to Ocata (Running on CentOS 7.4, with the RDO packages).
  
  Instance routes traffic to 169.254.169.254 to it's default gateway.
  Default gateway is an OpenStack router in a namespace on a networking node.
  
  - Traffic gets sent from the guest,
  - to the router,
  - iptables routes it to the metadata proxy service,
  - response packet gets routed back, leaving the namespace
  - Hypervisor gets the packet in
  - Checksum of packet is wrong, and the packet gets dropped before putting it on the bridge
  
  Based on the following bug https://bugs.launchpad.net/openstack-
  ansible/+bug/1483603, we found that adding the following iptable rule in
  the router namespace made this work again: 'iptables -t mangle -I
  POSTROUTING -p tcp --sport 9697 -j CHECKSUM --checksum-fill'
  
  (NOTE: The rule from the 1st comment to the bug did solve access to the
  metadata service, but the lack of precision introduced other problems
  with the network)

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/rocky
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/queens
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/stein
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/train
   Importance: Undecided
       Status: New

** Summary changed:

- Return traffic from metadata service may get dropped by hypervisor due to wrong checksum
+ [SRU] Return traffic from metadata service may get dropped by hypervisor due to wrong checksum

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1722584

Title:
  [SRU] Return traffic from metadata service may get dropped by
  hypervisor due to wrong checksum

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive queens series:
  New
Status in Ubuntu Cloud Archive rocky series:
  New
Status in Ubuntu Cloud Archive stein series:
  New
Status in Ubuntu Cloud Archive train series:
  New
Status in neutron:
  Fix Released

Bug description:
  [Impact]
  Prior addition of code to add checksum rules was found to cause problems with newer kernels. Patch subsequently reverted so this request is to backport those patches to the ubuntu archives.

  [Test Case]
  * deploy openstack (>= queens)
  * create router/network/instance (dvr=false,l3ha=false)
  * go to router ns on neutron-gateway and check that the following returns nothing
  sudo ip netns exec qrouter-<id> iptables -t mangle -S| grep '\--sport 9697 -j CHECKSUM --checksum-fill'

  [Regression Potential]
  None expected

  [Other Info]
  This revert patch does not remove rules added by the original patch so manual cleanup of those old rules is required.

  -----------------------------------------------------------------------------
  We have a problem with the metadata service not being responsive, when the proxied in the router namespace on some of our networking nodes after upgrading to Ocata (Running on CentOS 7.4, with the RDO packages).

  Instance routes traffic to 169.254.169.254 to it's default gateway.
  Default gateway is an OpenStack router in a namespace on a networking node.

  - Traffic gets sent from the guest,
  - to the router,
  - iptables routes it to the metadata proxy service,
  - response packet gets routed back, leaving the namespace
  - Hypervisor gets the packet in
  - Checksum of packet is wrong, and the packet gets dropped before putting it on the bridge

  Based on the following bug https://bugs.launchpad.net/openstack-
  ansible/+bug/1483603, we found that adding the following iptable rule
  in the router namespace made this work again: 'iptables -t mangle -I
  POSTROUTING -p tcp --sport 9697 -j CHECKSUM --checksum-fill'

  (NOTE: The rule from the 1st comment to the bug did solve access to
  the metadata service, but the lack of precision introduced other
  problems with the network)

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1722584/+subscriptions


References