← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1762341] Re: epoll_wait busy loop in neutron-openvswitch-agent

 

*** This bug is a duplicate of bug 1750777 ***
    https://bugs.launchpad.net/bugs/1750777

Thanks for the response.

Yeah, I probably should have tied those two changes closer together,
they unfortunately merged almost a week apart and the repo is probably
pulled nightly.

I'm not sure of the quickest way to update it to point at a newer
version either.

** This bug has been marked a duplicate of bug 1750777
   openvswitch agent eating CPU, time spent in ip_conntrack.py

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1762341

Title:
  epoll_wait busy loop in neutron-openvswitch-agent

Status in neutron:
  Incomplete

Bug description:
  I'm installing a demo openstack environment using TripleO Quickstart
  using the Queens release, and after deploying the undercloud node,
  neutron-openvswitch-agent will consume 100% CPU constantly apparently
  because it keeps calling epoll_wait with a timeout of 0.

  Whatever settings Neutron has are the defaults configured by TripleO
  quickstart with

  ./quickstart.sh -R queens -E config/environments/mysetup.yml --tags
  all -N config/nodes/mysetup.yaml -t all -p quickstart.yml os-demo-1

  and ./quickstart.sh  -T none -I -R queens -E
  config/environments/mysetup.yml --tags all -N
  config/nodes/mysetup.yaml -t all -p quickstart-extras-undercloud.yml
  os-demo-1

  The host node is a freshly installed HP Gen8 blade Server and updated
  CentOS 7 with default repositories and whatever the quickstart ansible
  scripts set up. the undercloud node is whatever CentOS 7 image is used
  by TripleO Quickstart commit 505a0c5df551c4518b769f77ddc3da09c4e6e2a1

  I have not configured any Neutron settings myself. This is 100%
  reproducible on my host if I delete all the virtual machines and run
  TripleO Quickstart again.

  I do not know what exactly triggers this behaviour, but the wait(0)
  call is in the run method in /usr/lib/python2.7/site-
  packages/eventlet/hubs.py

  I added a line of code to throw an exception when wait(0) happens and
  this is the stacktrace I get:

  2018-04-09 08:14:56.840 23151 CRITICAL neutron [-] Unhandled error: Exception: Eventlet waited for 0
  2018-04-09 08:14:56.840 23151 ERROR neutron Traceback (most recent call last):
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/bin/neutron-openvswitch-agent", line 10, in <module>
  2018-04-09 08:14:56.840 23151 ERROR neutron     sys.exit(main())
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
  2018-04-09 08:14:56.840 23151 ERROR neutron     agent_main.main()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 47, in main
  2018-04-09 08:14:56.840 23151 ERROR neutron     mod.main()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
  2018-04-09 08:14:56.840 23151 ERROR neutron     'neutron.plugins.ml2.drivers.openvswitch.agent.'
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 372, in run_apps
  2018-04-09 08:14:56.840 23151 ERROR neutron     app_mgr.close()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 549, in close
  2018-04-09 08:14:56.840 23151 ERROR neutron     self.uninstantiate(app_name)
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 533, in uninstantiate
  2018-04-09 08:14:56.840 23151 ERROR neutron     app.stop()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 185, in stop
  2018-04-09 08:14:56.840 23151 ERROR neutron     hub.joinall(self.threads)
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 103, in joinall
  2018-04-09 08:14:56.840 23151 ERROR neutron     t.wait()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
  2018-04-09 08:14:56.840 23151 ERROR neutron     return self._exit_event.wait()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
  2018-04-09 08:14:56.840 23151 ERROR neutron     return hubs.get_hub().switch()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
  2018-04-09 08:14:56.840 23151 ERROR neutron     return self.greenlet.switch()
  2018-04-09 08:14:56.840 23151 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 348, in run
  2018-04-09 08:14:56.840 23151 ERROR neutron     raise Exception("Eventlet waited for 0")
  2018-04-09 08:14:56.840 23151 ERROR neutron Exception: Eventlet waited for 0
  2018-04-09 08:14:56.840 23151 ERROR neutron 

  Just changing this wait to be non-zero drops cpu usage to ~nothing,
  though I can't tell if this impacts functionality in any way. Doesn't
  seem to, though.

  neutron package versions are as such:

  [stack@undercloud hubs]$ rpm -qa | grep neutron
  python2-ironic-neutron-agent-1.0.0-0.20180220161644.deb466b.el7.centos.noarch
  openstack-neutron-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  openstack-neutron-linuxbridge-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  openstack-neutron-ml2-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  openstack-neutron-lbaas-12.0.1-0.20180328075810.268cc42.el7.centos.noarch
  python2-neutron-lib-1.13.0-0.20180211233639.dcf96cd.el7.centos.noarch
  puppet-neutron-12.4.0-0.20180329040645.502d290.el7.centos.noarch
  openstack-neutron-common-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  python-neutron-lbaas-12.0.1-0.20180328075810.268cc42.el7.centos.noarch
  openstack-neutron-l2gw-agent-12.0.2-0.20180302213951.b064078.el7.centos.noarch
  openstack-neutron-metering-agent-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  python-neutron-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  openstack-neutron-sriov-nic-agent-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
  python2-neutronclient-6.7.0-0.20180211221651.95d64ce.el7.centos.noarch
  openstack-neutron-openvswitch-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch

  also
  python2-eventlet-0.20.1-2.el7.noarch
  python2-ryu-4.15-1.el7.noarch
  openvswitch-2.9.0-3.el7.x86_64
  python2-openvswitch-2.9.0-3.el7.noarch

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1762341/+subscriptions


References