yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79215
[Bug 1836023] [NEW] OVS agent "hangs" while processing trusted ports
Public bug reported:
Queens, ovsdb native interface.
On a loaded gtw node hosting > 1000 ports when restarting neutron-
openvswitch-agent at some moment agent stops sending state reports and
do any logging for a significant time, depending on number of ports. In
our case gtw node hosts > 1400 ports and agent hangs for ~100 seconds.
Thus if configured agent_down_time is less that 100 seconds, neutron
server sees agent as down, starts resources rescheduling. After agent
stops hanging it sees itself as "revived" and starts new full sync. This
loop is almost endless.
Debug showed the culprit is process_trusted_ports:
https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655
- this func does not yield control to other greenthreads and blocks
until all trusted ports are processed. Since on gateway nodes almost al
ports are "trusted" (router and dhcp ports) process_trusted_ports may
take significant time.
The proposal would be to add greenlet.sleep(0) inside loop in
process_trusted_ports - that fixed the issue on our environment.
** Affects: neutron
Importance: High
Assignee: Oleg Bondarev (obondarev)
Status: In Progress
** Tags: ovs-fw
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836023
Title:
OVS agent "hangs" while processing trusted ports
Status in neutron:
In Progress
Bug description:
Queens, ovsdb native interface.
On a loaded gtw node hosting > 1000 ports when restarting neutron-
openvswitch-agent at some moment agent stops sending state reports and
do any logging for a significant time, depending on number of ports.
In our case gtw node hosts > 1400 ports and agent hangs for ~100
seconds. Thus if configured agent_down_time is less that 100 seconds,
neutron server sees agent as down, starts resources rescheduling.
After agent stops hanging it sees itself as "revived" and starts new
full sync. This loop is almost endless.
Debug showed the culprit is process_trusted_ports:
https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655
- this func does not yield control to other greenthreads and blocks
until all trusted ports are processed. Since on gateway nodes almost
al ports are "trusted" (router and dhcp ports) process_trusted_ports
may take significant time.
The proposal would be to add greenlet.sleep(0) inside loop in
process_trusted_ports - that fixed the issue on our environment.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836023/+subscriptions
Follow ups