← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1750777] Re: openvswitch agent eating CPU, time spent in ip_conntrack.py

 

This bug was fixed in the package neutron - 2:12.0.1-0ubuntu1.1~cloud0
---------------

 neutron (2:12.0.1-0ubuntu1.1~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:12.0.1-0ubuntu1.1) bionic; urgency=medium
 .
   * d/p/remove-race-and-simplify-conntrack-state-management.patch:
     Cherry-picked from upstream stable/queens branch to prevent
     ovs-agent from eating up CPU (LP: #1750777).
   * d/gbp.conf: Create stable/queens branch.


** Changed in: cloud-archive/queens
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1750777

Title:
  openvswitch agent eating CPU, time spent in ip_conntrack.py

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released
Status in neutron source package in Cosmic:
  Fix Released

Bug description:
  We just ran into a case where the openvswitch agent (local dev
  destack, current master branch) eats 100% of CPU time.

  Pyflame profiling show the time being largely spent in
  neutron.agent.linux.ip_conntrack, line 95.

  https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95

  The code around this line is:

          while True:
              pool.spawn_n(self._process_queue)

  The documentation of eventlet.spawn_n says: "The same as spawn(), but
  it’s not possible to know how the function terminated (i.e. no return
  value or exceptions). This makes execution faster. See spawn_n for
  more details."  I suspect that GreenPool.spawn_n may behave similarly.

  It seems plausible that spawn_n is returning very quickly because of
  some error, and then all time is quickly spent in a short circuited
  while loop.

  SRU details for Ubuntu:
  -----------------------
  [Impact]
  We're cherry-picking a single bug-fix patch here from the upstream stable/queens branch as there is not currently an upstream stable point release available that includes this fix. We'd like to make sure all of our supported customers have access to this fix as there is a significant performance hit without it.

  [Test Case]
  The following SRU process was followed:
  https://wiki.ubuntu.com/OpenStackUpdates

  In order to avoid regression of existing consumers, the OpenStack team
  will run their continuous integration test against the packages that
  are in -proposed. A successful run of all available tests will be
  required before the proposed packages can be let into -updates.

  The OpenStack team will be in charge of attaching the output summary
  of the executed tests. The OpenStack team members will not mark
  ‘verification-done’ until this has happened.

  [Regression Potential]
  In order to mitigate the regression potential, the results of the
  aforementioned tests are attached to this bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1750777/+subscriptions


References