yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72228
[Bug 1762341] Re: epoll_wait busy loop in neutron-openvswitch-agent
*** This bug is a duplicate of bug 1750777 ***
https://bugs.launchpad.net/bugs/1750777
Thanks for the response.
Yeah, I probably should have tied those two changes closer together,
they unfortunately merged almost a week apart and the repo is probably
pulled nightly.
I'm not sure of the quickest way to update it to point at a newer
version either.
** This bug has been marked a duplicate of bug 1750777
openvswitch agent eating CPU, time spent in ip_conntrack.py
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1762341
Title:
epoll_wait busy loop in neutron-openvswitch-agent
Status in neutron:
Incomplete
Bug description:
I'm installing a demo openstack environment using TripleO Quickstart
using the Queens release, and after deploying the undercloud node,
neutron-openvswitch-agent will consume 100% CPU constantly apparently
because it keeps calling epoll_wait with a timeout of 0.
Whatever settings Neutron has are the defaults configured by TripleO
quickstart with
./quickstart.sh -R queens -E config/environments/mysetup.yml --tags
all -N config/nodes/mysetup.yaml -t all -p quickstart.yml os-demo-1
and ./quickstart.sh -T none -I -R queens -E
config/environments/mysetup.yml --tags all -N
config/nodes/mysetup.yaml -t all -p quickstart-extras-undercloud.yml
os-demo-1
The host node is a freshly installed HP Gen8 blade Server and updated
CentOS 7 with default repositories and whatever the quickstart ansible
scripts set up. the undercloud node is whatever CentOS 7 image is used
by TripleO Quickstart commit 505a0c5df551c4518b769f77ddc3da09c4e6e2a1
I have not configured any Neutron settings myself. This is 100%
reproducible on my host if I delete all the virtual machines and run
TripleO Quickstart again.
I do not know what exactly triggers this behaviour, but the wait(0)
call is in the run method in /usr/lib/python2.7/site-
packages/eventlet/hubs.py
I added a line of code to throw an exception when wait(0) happens and
this is the stacktrace I get:
2018-04-09 08:14:56.840 23151 CRITICAL neutron [-] Unhandled error: Exception: Eventlet waited for 0
2018-04-09 08:14:56.840 23151 ERROR neutron Traceback (most recent call last):
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/bin/neutron-openvswitch-agent", line 10, in <module>
2018-04-09 08:14:56.840 23151 ERROR neutron sys.exit(main())
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2018-04-09 08:14:56.840 23151 ERROR neutron agent_main.main()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 47, in main
2018-04-09 08:14:56.840 23151 ERROR neutron mod.main()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
2018-04-09 08:14:56.840 23151 ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.'
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 372, in run_apps
2018-04-09 08:14:56.840 23151 ERROR neutron app_mgr.close()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 549, in close
2018-04-09 08:14:56.840 23151 ERROR neutron self.uninstantiate(app_name)
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 533, in uninstantiate
2018-04-09 08:14:56.840 23151 ERROR neutron app.stop()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 185, in stop
2018-04-09 08:14:56.840 23151 ERROR neutron hub.joinall(self.threads)
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 103, in joinall
2018-04-09 08:14:56.840 23151 ERROR neutron t.wait()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2018-04-09 08:14:56.840 23151 ERROR neutron return self._exit_event.wait()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
2018-04-09 08:14:56.840 23151 ERROR neutron return hubs.get_hub().switch()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2018-04-09 08:14:56.840 23151 ERROR neutron return self.greenlet.switch()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 348, in run
2018-04-09 08:14:56.840 23151 ERROR neutron raise Exception("Eventlet waited for 0")
2018-04-09 08:14:56.840 23151 ERROR neutron Exception: Eventlet waited for 0
2018-04-09 08:14:56.840 23151 ERROR neutron
Just changing this wait to be non-zero drops cpu usage to ~nothing,
though I can't tell if this impacts functionality in any way. Doesn't
seem to, though.
neutron package versions are as such:
[stack@undercloud hubs]$ rpm -qa | grep neutron
python2-ironic-neutron-agent-1.0.0-0.20180220161644.deb466b.el7.centos.noarch
openstack-neutron-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
openstack-neutron-linuxbridge-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
openstack-neutron-ml2-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
openstack-neutron-lbaas-12.0.1-0.20180328075810.268cc42.el7.centos.noarch
python2-neutron-lib-1.13.0-0.20180211233639.dcf96cd.el7.centos.noarch
puppet-neutron-12.4.0-0.20180329040645.502d290.el7.centos.noarch
openstack-neutron-common-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
python-neutron-lbaas-12.0.1-0.20180328075810.268cc42.el7.centos.noarch
openstack-neutron-l2gw-agent-12.0.2-0.20180302213951.b064078.el7.centos.noarch
openstack-neutron-metering-agent-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
python-neutron-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
openstack-neutron-sriov-nic-agent-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
python2-neutronclient-6.7.0-0.20180211221651.95d64ce.el7.centos.noarch
openstack-neutron-openvswitch-12.0.1-0.20180328231751.7e1d5b6.el7.centos.noarch
also
python2-eventlet-0.20.1-2.el7.noarch
python2-ryu-4.15-1.el7.noarch
openvswitch-2.9.0-3.el7.x86_64
python2-openvswitch-2.9.0-3.el7.noarch
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1762341/+subscriptions
References