yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #91005
[Bug 1920975] Re: neutron dvr should lower proxy_delay when using proxy_arp
Reviewed: https://review.opendev.org/c/openstack/neutron/+/782570
Committed: https://opendev.org/openstack/neutron/commit/d7f68a0ce76ffb9a93dfba167dfffba53189350d
Submitter: "Zuul (22348)"
Branch: master
commit d7f68a0ce76ffb9a93dfba167dfffba53189350d
Author: Edward Hope-Morley <edward.hope-morley@xxxxxxxxxxxxx>
Date: Tue Mar 23 17:18:48 2021 +0000
Set proxy_delay to one when using proxy ARP
Neutron DVR uses proxy ARP in fip namespaces to respond
to ARP requests for instance floating IPs. In doing so
it is susceptible to a random delay of up to (by
default) 800ms which is added to the time taken to
respond to ARP requests. This causes an initial delay
to ARP reponses that is entirely avoidable by changing this
parameter to one, instead of the default, to make it as
short as possible.
NOTE: Setting this to zero is actually undefined and will
cause the kernel to choose a random delay from 0 to
U32_MAX so is not advised. Gleaned from this comment in
__get_random_u32_below(), which is eventually called
from pneigh_enqueue():
/*
* This function is technically undefined for ceil == 0, and in fact
* for the non-underscored constant version in the header, we build bug
* on that. But for the non-constant case, it's convenient to have that
* evaluate to being a straight call to get_random_u32(), so that
* get_random_u32_inclusive() can work over its whole range without
* undefined behavior.
*/
Will propose a kernel change to fix this but cannot
assume it will be in a distro kernel for a while.
Change-Id: I0dc65b17ef436a97d0fcbd164d124ec59a1b2797
Closes-Bug: #1920975
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1920975
Title:
neutron dvr should lower proxy_delay when using proxy_arp
Status in OpenStack Neutron Open vSwitch Charm:
New
Status in neutron:
Fix Released
Bug description:
Neutron DVR uses proxy_arp in fip namespaces to respond to arp
requests for instance floating ips. In doing so it is susceptible to a
random delay up to by default 800ms which is added to the time taken
to respond to an arp request that has to be proxied i.e.
# ip netns exec fip-a297543b-9ef9-4bd5-b1ca-e85a726c1726 sysctl net.ipv4.{conf.fg-51f3e07b-2d.proxy_arp,neigh.fg-51f3e07b-2d.proxy_delay}
net.ipv4.conf.fg-51f3e07b-2d.proxy_arp = 1
net.ipv4.neigh.fg-51f3e07b-2d.proxy_delay = 80
The result of this is seen when e.g. you ping a vm fip and the first
request takes significantly longer than subsequent requests:
$ ping -c 5 10.5.150.90
PING 10.5.150.90 (10.5.150.90) 56(84) bytes of data.
64 bytes from 10.5.150.90: icmp_seq=1 ttl=60 time=491 ms
64 bytes from 10.5.150.90: icmp_seq=2 ttl=60 time=1.08 ms
64 bytes from 10.5.150.90: icmp_seq=3 ttl=60 time=1.39 ms
64 bytes from 10.5.150.90: icmp_seq=4 ttl=60 time=1.16 ms
64 bytes from 10.5.150.90: icmp_seq=5 ttl=60 time=1.03 ms
--- 10.5.150.90 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 1.034/99.157/491.134/195.988 ms
To repro again simply delete arp entry for fip from fip ns of source
compute host.
By kernel standards this behaviour is by-design when using the default
settings but some workloads may be impacted by this initial delay
especially e.g. in loaded environments where the arp caches are under
strain and hitting gc_thresh limits.
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1920975/+subscriptions
References