yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79245
[Bug 1836023] Re: OVS agent "hangs" while processing trusted ports
Reviewed: https://review.opendev.org/670014
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=da539da3780188f01e18ef106dde9ca180324c2a
Submitter: Zuul
Branch: master
commit da539da3780188f01e18ef106dde9ca180324c2a
Author: Oleg Bondarev <obondarev@xxxxxxxxxxxx>
Date: Wed Jul 10 12:39:13 2019 +0400
Yield control to other greenthreads while processing trusted ports
process_trusted_ports() appeared to be greenthread unfriendly, so
if there are many trusted ports on a node, openvswitch agent may
"hang" for a significant time.
This patch adds explicit yield.
Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
Closes-Bug: #1836023
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836023
Title:
OVS agent "hangs" while processing trusted ports
Status in neutron:
Fix Released
Bug description:
Queens, ovsdb native interface.
On a loaded gtw node hosting > 1000 ports when restarting neutron-
openvswitch-agent at some moment agent stops sending state reports and
do any logging for a significant time, depending on number of ports.
In our case gtw node hosts > 1400 ports and agent hangs for ~100
seconds. Thus if configured agent_down_time is less that 100 seconds,
neutron server sees agent as down, starts resources rescheduling.
After agent stops hanging it sees itself as "revived" and starts new
full sync. This loop is almost endless.
Debug showed the culprit is process_trusted_ports:
https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655
- this func does not yield control to other greenthreads and blocks
until all trusted ports are processed. Since on gateway nodes almost
al ports are "trusted" (router and dhcp ports) process_trusted_ports
may take significant time.
The proposal would be to add greenlet.sleep(0) inside loop in
process_trusted_ports - that fixed the issue on our environment.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836023/+subscriptions
References