← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1832021] Re: Checksum drop of metadata traffic on isolated networks with DPDK

 

** Description changed:

+ [Impact]
+ 
  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs in
  the qdhcp ip netns rather than the qrouter netns.
  
  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True
  
  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.
  
- When we added the following in the qdhcp netns, VMs regained access to
- metadata:
+ [Test Plan]
  
-  iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM
- --checksum-fill
+ 1. Create an OpenStack deployment with DPDK options enabled and 'enable-
+ local-dhcp-and-metadata: true' in neutron-openvswitch. A sample, simple
+ 3 node bundle can be found here[1].
  
- It seems this setting was recently removed from the qrouter netns [0]
- but it never existed in the qdhcp to begin with.
+ 2. Create an external flat network and subnet:
  
- [0] https://review.opendev.org/#/c/654645/
+ openstack network show dpdk_net || \
+   openstack network create --provider-network-type flat \
+                            --provider-physical-network physnet1 dpdk_net \
+                            --external
  
- Related LP Bug #1831935
- See https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1831935/comments/10
+ openstack subnet show dpdk_net || \
+     openstack subnet create --allocation-pool start=10.230.58.100,end=10.230.58.200 \
+                             --subnet-range 10.230.56.0/21 --dhcp --gateway 10.230.56.1 \
+                             --dns-nameserver 10.230.56.2 \
+                             --ip-version 4 --network dpdk_net dpdk_subnet
+ 
+ 
+ 3. Create an instance attached to that network. The instance must have a flavor that uses huge pages.
+ 
+ openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
+ openstack flavor set m1.dpdk --property hw:mem_page_size=large
+ 
+ openstack server create --wait --image xenial --flavor m1.dpdk --key-
+ name testkey --network dpdk_net i1
+ 
+ 4. Log into the instance host and check the instance console. The
+ instance will hang into the boot and show the following message:
+ 
+ 2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
+ path http://169.254.169.254/openstack/2015-10-15/user_data due to:
+ HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
+ (read timeout=10.0)
+ 
+ 5. Apply the fix in all computes, restart the DHCP agents in all
+ computes and create the instance again.
+ 
+ 6. No errors should be shown and the instance quickly boots.
+ 
+ 
+ [Where problems could occur]
+ 
+ * This change is only touched if datapath_type and ovs_use_veth. Those settings are mostly used for DPDK environments. The core of the fix is
+ to toggle off checksum offload done by the DHCP namespace interfaces.
+ This will have the drawback of adding some overhead on the packet processing for DHCP traffic but given DHCP does not demand too much data, this should be a minor proble.
+ 
+ * Future changes on the syntax of the ethtool command could cause
+ regressions
+ 
+ 
+ [Other Info]
+ 
+  * None
+ 
+ 
+ [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1832021

Title:
  Checksum drop of metadata traffic on isolated networks with DPDK

Status in OpenStack neutron-openvswitch charm:
  Fix Released
Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released

Bug description:
  [Impact]

  When an isolated network using provider networks for tenants (meaning
  without virtual routers: DVR or network node), metadata access occurs
  in the qdhcp ip netns rather than the qrouter netns.

  The following options are set in the dhcp_agent.ini file:
  force_metadata = True
  enable_isolated_metadata = True

  VMs on the provider tenant network are unable to access metadata as
  packets are dropped due to checksum.

  [Test Plan]

  1. Create an OpenStack deployment with DPDK options enabled and
  'enable-local-dhcp-and-metadata: true' in neutron-openvswitch. A
  sample, simple 3 node bundle can be found here[1].

  2. Create an external flat network and subnet:

  openstack network show dpdk_net || \
    openstack network create --provider-network-type flat \
                             --provider-physical-network physnet1 dpdk_net \
                             --external

  openstack subnet show dpdk_net || \
      openstack subnet create --allocation-pool start=10.230.58.100,end=10.230.58.200 \
                              --subnet-range 10.230.56.0/21 --dhcp --gateway 10.230.56.1 \
                              --dns-nameserver 10.230.56.2 \
                              --ip-version 4 --network dpdk_net dpdk_subnet

  
  3. Create an instance attached to that network. The instance must have a flavor that uses huge pages.

  openstack flavor create --ram 8192 --disk 50 --vcpus 4 m1.dpdk
  openstack flavor set m1.dpdk --property hw:mem_page_size=large

  openstack server create --wait --image xenial --flavor m1.dpdk --key-
  name testkey --network dpdk_net i1

  4. Log into the instance host and check the instance console. The
  instance will hang into the boot and show the following message:

  2020-11-20 09:43:26,790 - openstack.py[DEBUG]: Failed reading optional
  path http://169.254.169.254/openstack/2015-10-15/user_data due to:
  HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out.
  (read timeout=10.0)

  5. Apply the fix in all computes, restart the DHCP agents in all
  computes and create the instance again.

  6. No errors should be shown and the instance quickly boots.

  
  [Where problems could occur]

  * This change is only touched if datapath_type and ovs_use_veth. Those settings are mostly used for DPDK environments. The core of the fix is
  to toggle off checksum offload done by the DHCP namespace interfaces.
  This will have the drawback of adding some overhead on the packet processing for DHCP traffic but given DHCP does not demand too much data, this should be a minor proble.

  * Future changes on the syntax of the ethtool command could cause
  regressions


  [Other Info]

   * None


  [1] https://gist.github.com/sombrafam/e0741138773e444960eb4aeace6e3e79

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1832021/+subscriptions


References