← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1920975] Re: neutron dvr should lower proxy_delay when using proxy_arp

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/782570
Committed: https://opendev.org/openstack/neutron/commit/d7f68a0ce76ffb9a93dfba167dfffba53189350d
Submitter: "Zuul (22348)"
Branch:    master

commit d7f68a0ce76ffb9a93dfba167dfffba53189350d
Author: Edward Hope-Morley <edward.hope-morley@xxxxxxxxxxxxx>
Date:   Tue Mar 23 17:18:48 2021 +0000

    Set proxy_delay to one when using proxy ARP
    
    Neutron DVR uses proxy ARP in fip namespaces to respond
    to ARP requests for instance floating IPs. In doing so
    it is susceptible to a random delay of up to (by
    default) 800ms which is added to the time taken to
    respond to ARP requests. This causes an initial delay
    to ARP reponses that is entirely avoidable by changing this
    parameter to one, instead of the default, to make it as
    short as possible.
    
    NOTE: Setting this to zero is actually undefined and will
    cause the kernel to choose a random delay from 0 to
    U32_MAX so is not advised. Gleaned from this comment in
    __get_random_u32_below(), which is eventually called
    from pneigh_enqueue():
    
    /*
     * This function is technically undefined for ceil == 0, and in fact
     * for the non-underscored constant version in the header, we build bug
     * on that. But for the non-constant case, it's convenient to have that
     * evaluate to being a straight call to get_random_u32(), so that
     * get_random_u32_inclusive() can work over its whole range without
     * undefined behavior.
     */
    
    Will propose a kernel change to fix this but cannot
    assume it will be in a distro kernel for a while.
    
    Change-Id: I0dc65b17ef436a97d0fcbd164d124ec59a1b2797
    Closes-Bug: #1920975


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1920975

Title:
  neutron dvr should lower proxy_delay when using proxy_arp

Status in OpenStack Neutron Open vSwitch Charm:
  New
Status in neutron:
  Fix Released

Bug description:
  Neutron DVR uses proxy_arp in fip namespaces to respond to arp
  requests for instance floating ips. In doing so it is susceptible to a
  random delay up to by default 800ms which is added to the time taken
  to respond to an arp request that has to be proxied i.e.

  # ip netns exec fip-a297543b-9ef9-4bd5-b1ca-e85a726c1726 sysctl net.ipv4.{conf.fg-51f3e07b-2d.proxy_arp,neigh.fg-51f3e07b-2d.proxy_delay}
  net.ipv4.conf.fg-51f3e07b-2d.proxy_arp = 1
  net.ipv4.neigh.fg-51f3e07b-2d.proxy_delay = 80

  The result of this is seen when e.g. you ping a vm fip and the first
  request takes significantly longer than subsequent requests:

  $ ping -c 5 10.5.150.90
  PING 10.5.150.90 (10.5.150.90) 56(84) bytes of data.
  64 bytes from 10.5.150.90: icmp_seq=1 ttl=60 time=491 ms
  64 bytes from 10.5.150.90: icmp_seq=2 ttl=60 time=1.08 ms
  64 bytes from 10.5.150.90: icmp_seq=3 ttl=60 time=1.39 ms
  64 bytes from 10.5.150.90: icmp_seq=4 ttl=60 time=1.16 ms
  64 bytes from 10.5.150.90: icmp_seq=5 ttl=60 time=1.03 ms

  --- 10.5.150.90 ping statistics ---
  5 packets transmitted, 5 received, 0% packet loss, time 4007ms
  rtt min/avg/max/mdev = 1.034/99.157/491.134/195.988 ms

  To repro again simply delete arp entry for fip from fip ns of source
  compute host.

  By kernel standards this behaviour is by-design when using the default
  settings but some workloads may be impacted by this initial delay
  especially e.g. in loaded environments where the arp caches are under
  strain and hitting gc_thresh limits.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1920975/+subscriptions



References