yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64465
[Bug 1694505] Re: neutron-ovs-agent dies with return code 0 when neutron-server is down
Reviewed: https://review.openstack.org/469231
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=73701bf75b964509c7d7e8b62dba97f7cbe9c87a
Submitter: Jenkins
Branch: master
commit 73701bf75b964509c7d7e8b62dba97f7cbe9c87a
Author: Ihar Hrachyshka <ihrachys@xxxxxxxxxx>
Date: Tue May 30 19:42:16 2017 +0000
ovs: bubble up failures into main thread in native ofctl mode
When native ofctl interface is used (the default), the agent main() is
running in a separate gevent thread. Unless we explicitly request from
ryu to raise errors that may have happened in the agent app, it will
ignore them (only logging a warning message). This may interfere with
service management software like systemd that may use the return code to
decide whether to restart the dead service.
This patch makes ryu raise any uncaught errors happening inside the
agent. It also makes the agent 'wrapper' helper function not to swallow
raised exceptions on logging the error. Those two changes combined make
the agent exit with rc=1 if an exception happens inside the main()
function when in native mode.
This patch doesn't include any unit tests because those would be very
silly (like checking that we indeed pass the needed arguments to ryu).
Change-Id: Ic86b5eeae25a916c3c51f21e6820f5b0212dd5f8
Closes-Bug: #1694505
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1694505
Title:
neutron-ovs-agent dies with return code 0 when neutron-server is down
Status in neutron:
Fix Released
Bug description:
Environment description:
- Deployment using RDO Trunk repo from master.
- Neutron based on commit c430e9b
In neutron-ovs-agent is started before neutron-server starts, it exits
with return code 0, which is not identified by systemd as a failure so
it's not restarted.
following ERRORS appear in /var/log/neutron/openvswitch-agent.log:
2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met
hod bulk_pull called with arguments (<neutron_lib.context.Context object at 0x75ff950>, 'Port') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:47
2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
....
2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of an exception
...
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'to message ID %s' % msg_id)
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp MessagingTimeout: Timed out waiting for a reply to message ID 3874905892f543e0be9984e6504644bb
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=29502
From systemd side, following status is reported:
[root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago
Main PID: 29042 (code=exited, status=0/SUCCESS)
May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch Agent...
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-arptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-iptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-ip6tables = 1
May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch Agent.
May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be reg...te reports.
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load neutron.openstack.common.notifier.rpc_notifier
Note the (code=exited, status=0/SUCCESS)
A easy way to reproduce this is:
1. Stop neutron-server
2. Start manually neutron-openvswitch-agent:
# /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent
Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Could not load neutron.openstack.common.notifier.rpc_notifier
[root@weirdo1 neutron]# echo $?
0
Note return code is 0
I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd service restart it again based on "Restart=on-failure" current policy. Otherwise we should change systemd restart policy.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1694505/+subscriptions
References