yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64391
[Bug 1694505] [NEW] neutron-ovs-agent dies with return code 0 when neutron-server is down
Public bug reported:
Environment description:
- Deployment using RDO Trunk repo from master.
- Neutron based on commit c430e9b
In neutron-ovs-agent is started before neutron-server starts, it exits
with return code 0, which is not identified by systemd as a failure so
it's not restarted.
following ERRORS appear in /var/log/neutron/openvswitch-agent.log:
2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met
hod bulk_pull called with arguments (<neutron_lib.context.Context object at 0x75ff950>, 'Port') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:47
2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
....
2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of an exception
...
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'to message ID %s' % msg_id)
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp MessagingTimeout: Timed out waiting for a reply to message ID 3874905892f543e0be9984e6504644bb
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=29502
>From systemd side, following status is reported:
[root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago
Main PID: 29042 (code=exited, status=0/SUCCESS)
May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch Agent...
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-arptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-iptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-ip6tables = 1
May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch Agent.
May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be reg...te reports.
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load neutron.openstack.common.notifier.rpc_notifier
Note the (code=exited, status=0/SUCCESS)
A easy way to reproduce this is:
1. Stop neutron-server
2. Start manually neutron-openvswitch-agent:
# /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent
Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Could not load neutron.openstack.common.notifier.rpc_notifier
[root@weirdo1 neutron]# echo $?
0
Note return code is 0
I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd service restart it again based on "Restart=on-failure" current policy. Otherwise we should change systemd restart policy.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1694505
Title:
neutron-ovs-agent dies with return code 0 when neutron-server is down
Status in neutron:
New
Bug description:
Environment description:
- Deployment using RDO Trunk repo from master.
- Neutron based on commit c430e9b
In neutron-ovs-agent is started before neutron-server starts, it exits
with return code 0, which is not identified by systemd as a failure so
it's not restarted.
following ERRORS appear in /var/log/neutron/openvswitch-agent.log:
2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met
hod bulk_pull called with arguments (<neutron_lib.context.Context object at 0x75ff950>, 'Port') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:47
2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
....
2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of an exception
...
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'to message ID %s' % msg_id)
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp MessagingTimeout: Timed out waiting for a reply to message ID 3874905892f543e0be9984e6504644bb
2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=29502
From systemd side, following status is reported:
[root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago
Main PID: 29042 (code=exited, status=0/SUCCESS)
May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch Agent...
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-arptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-iptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-ip6tables = 1
May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch Agent.
May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be reg...te reports.
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load neutron.openstack.common.notifier.rpc_notifier
Note the (code=exited, status=0/SUCCESS)
A easy way to reproduce this is:
1. Stop neutron-server
2. Start manually neutron-openvswitch-agent:
# /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent
Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications".
Could not load neutron.openstack.common.notifier.rpc_notifier
[root@weirdo1 neutron]# echo $?
0
Note return code is 0
I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd service restart it again based on "Restart=on-failure" current policy. Otherwise we should change systemd restart policy.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1694505/+subscriptions
Follow ups