yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64213
[Bug 1662804] Re: Agent is failing to process HA router if initialize() fails
** Changed in: cloud-archive
Status: New => Fix Released
** Summary changed:
- Agent is failing to process HA router if initialize() fails
+ [SRU] Agent is failing to process HA router if initialize() fails
** Description changed:
+ [Impact]
+
+ This patch resolves, amongst other things, issues with a create and
+ delete router request race condition when using l3 HA. At the time of
+ backport this patch is already available from Ocata onwards and has been
+ verified as sufficiently minimal and safe for backport to Newton and
+ Mitaka. Essentially the error case is a result of an incorrectly
+ intialised router update action being executed without proper checks and
+ this patch fixes this.
+
+ [Test Case]
+
+ * Deploy Openstack Mitaka - http://pastebin.ubuntu.com/24637244/ - with
+ neutron-l3-agent configured to provide HA (vrrp) routers.
+
+ * Repeatedly create and delete routers in rapid succession and check
+ that the l3 agent does not go into an infinite error loop i.e. run
+ http://pastebin.ubuntu.com/24634950/ and run do tail -F
+ /var/log/neutron/neutron-l3-agent.log on all units of l3 agent. Also
+ check that qrouter- namepspaces are not stacking up. For Mitaka I
+ typically hit the error after ~20 create/deletes.
+
+ [Regression Potential]
+
+ * I do not envisage any regression potential from this patch.
+
+ ====
+
When HA router initialize() function fails for some reason(rabbitmq
restart or no ha_port), keepalived_manager or KeepalivedInstance won't
be configured. In this case, _process_router_if_compatible fails with
exception, then _resync_router(update) will again try to process this
router in loop. As we try initialize() only once(which was failed),
retry of _process_router_if_compatible will always fail(no keepalived
manager or instance) and router is never configured(see below trace).
2017-02-06 18:34:18.539 26120 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', 'qr-e6
3406e1-e7'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:101
2017-02-06 18:34:18.544 26120 DEBUG neutron.agent.linux.utils [-]
Command: ['ip', 'netns', 'exec', u'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', u'qr-e63406e1-e7']
Exit code: 0
- execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:156
+ execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:156
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info [-] 'NoneType' object has no attribute 'get_process'
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._process_internal_ports(agent.pd)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self.internal_network_added(p)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._disable_ipv6_addressing_on_interface(interface_name)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if self._should_delete_ipv6_lladdr(ipv6_lladdr):
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if manager.get_process().active:
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info AttributeError: 'NoneType' object has no attribute 'get_process'
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '114a72fe-02ae-4b87-a2e7-70f962df0951'
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 506, in _process_router_update
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 445, in _process_router_if_compatible
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_updated_router(router)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 459, in _process_updated_router
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent ri.process(self)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 377, in process
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent super(HaRouter, self).process(agent)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 362, in call
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.logger(e)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent return func(*args, **kwargs)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.internal_network_added(p)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._disable_ipv6_addressing_on_interface(interface_name)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if self._should_delete_ipv6_lladdr(ipv6_lladdr):
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if manager.get_process().active:
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'get_process'
** Tags added: sts sts-sru-needed
** Patch added: "lp1662804-newton.debdiff"
https://bugs.launchpad.net/neutron/+bug/1662804/+attachment/4882150/+files/lp1662804-newton.debdiff
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1662804
Title:
[SRU] Agent is failing to process HA router if initialize() fails
Status in Ubuntu Cloud Archive:
Fix Released
Status in neutron:
Fix Released
Status in neutron package in Ubuntu:
Fix Released
Bug description:
[Impact]
This patch resolves, amongst other things, issues with a create and
delete router request race condition when using l3 HA. At the time of
backport this patch is already available from Ocata onwards and has
been verified as sufficiently minimal and safe for backport to Newton
and Mitaka. Essentially the error case is a result of an incorrectly
intialised router update action being executed without proper checks
and this patch fixes this.
[Test Case]
* Deploy Openstack Mitaka - http://pastebin.ubuntu.com/24637244/ -
with neutron-l3-agent configured to provide HA (vrrp) routers.
* Repeatedly create and delete routers in rapid succession and check
that the l3 agent does not go into an infinite error loop i.e. run
http://pastebin.ubuntu.com/24634950/ and run do tail -F
/var/log/neutron/neutron-l3-agent.log on all units of l3 agent. Also
check that qrouter- namepspaces are not stacking up. For Mitaka I
typically hit the error after ~20 create/deletes.
[Regression Potential]
* I do not envisage any regression potential from this patch.
====
When HA router initialize() function fails for some reason(rabbitmq
restart or no ha_port), keepalived_manager or KeepalivedInstance won't
be configured. In this case, _process_router_if_compatible fails with
exception, then _resync_router(update) will again try to process this
router in loop. As we try initialize() only once(which was failed),
retry of _process_router_if_compatible will always fail(no keepalived
manager or instance) and router is never configured(see below trace).
2017-02-06 18:34:18.539 26120 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', 'qr-e6
3406e1-e7'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:101
2017-02-06 18:34:18.544 26120 DEBUG neutron.agent.linux.utils [-]
Command: ['ip', 'netns', 'exec', u'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', u'qr-e63406e1-e7']
Exit code: 0
execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:156
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info [-] 'NoneType' object has no attribute 'get_process'
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._process_internal_ports(agent.pd)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self.internal_network_added(p)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._disable_ipv6_addressing_on_interface(interface_name)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if self._should_delete_ipv6_lladdr(ipv6_lladdr):
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if manager.get_process().active:
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info AttributeError: 'NoneType' object has no attribute 'get_process'
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '114a72fe-02ae-4b87-a2e7-70f962df0951'
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 506, in _process_router_update
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 445, in _process_router_if_compatible
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_updated_router(router)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 459, in _process_updated_router
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent ri.process(self)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 377, in process
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent super(HaRouter, self).process(agent)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 362, in call
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.logger(e)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent return func(*args, **kwargs)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.internal_network_added(p)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._disable_ipv6_addressing_on_interface(interface_name)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if self._should_delete_ipv6_lladdr(ipv6_lladdr):
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if manager.get_process().active:
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'get_process'
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1662804/+subscriptions
References