← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1836023] Re: OVS agent "hangs" while processing trusted ports

 

Reviewed:  https://review.opendev.org/670014
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=da539da3780188f01e18ef106dde9ca180324c2a
Submitter: Zuul
Branch:    master

commit da539da3780188f01e18ef106dde9ca180324c2a
Author: Oleg Bondarev <obondarev@xxxxxxxxxxxx>
Date:   Wed Jul 10 12:39:13 2019 +0400

    Yield control to other greenthreads while processing trusted ports
    
    process_trusted_ports() appeared to be greenthread unfriendly, so
    if there are many trusted ports on a node, openvswitch agent may
    "hang" for a significant time.
    This patch adds explicit yield.
    
    Change-Id: I7c00812f877e2fc966bbac3060e1187ce1b809ca
    Closes-Bug: #1836023


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1836023

Title:
  OVS agent "hangs" while processing trusted ports

Status in neutron:
  Fix Released

Bug description:
  Queens, ovsdb native interface.

  On a loaded gtw node hosting > 1000 ports when restarting neutron-
  openvswitch-agent at some moment agent stops sending state reports and
  do any logging for a significant time, depending on number of ports.
  In our case gtw node hosts > 1400 ports and agent hangs for ~100
  seconds. Thus if configured agent_down_time is less that 100 seconds,
  neutron server sees agent as down, starts resources rescheduling.
  After agent stops hanging it sees itself as "revived" and starts new
  full sync. This loop is almost endless.

  Debug showed the culprit is process_trusted_ports:
  https://github.com/openstack/neutron/blob/13.0.4/neutron/agent/linux/openvswitch_firewall/firewall.py#L655
  - this func does not yield control to other greenthreads and blocks
  until all trusted ports are processed. Since on gateway nodes almost
  al ports are "trusted" (router and dhcp ports) process_trusted_ports
  may take significant time.

  The proposal would be to add greenlet.sleep(0) inside loop in
  process_trusted_ports - that fixed the issue on our environment.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1836023/+subscriptions


References