yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80600
[Bug 1849676] Re: DHCP agents time out during startup at 60s when there is enough agents
Reviewed: https://review.opendev.org/692160
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=aedc09917680bc77501f4b57142e73768e3d06ad
Submitter: Zuul
Branch: master
commit aedc09917680bc77501f4b57142e73768e3d06ad
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date: Wed Oct 30 14:35:37 2019 +0000
Increase timeout when waiting for dnsmasq enablement
In the reported bug, a regression was introduced in [1] when limiting
the time to have a "dnsmasq" process enabled. It has been reported, as
documented in the bug, that in older versions (Queens), using Python 2
[2] and older versions of "ip_lib" (not implementing most of the
commands using Pyroute2), that the time needed to spawn a "dnsmasq"
process exceeds the default 60 seconds defined in
"common_utils.wait_until_true".
This patch increases this time by a reasonable 300 seconds.
[1] https://review.opendev.org/#/c/643732
[2] https://bugs.python.org/issue35757
Change-Id: I2d8693145da72825876b951f2d10afe9ca28ff6e
Closes-Bug: #1849676
** Changed in: neutron
Status: In Progress => Fix Released
** Bug watch added: Python Roundup #35757
http://bugs.python.org/issue35757
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1849676
Title:
DHCP agents time out during startup at 60s when there is enough agents
Status in neutron:
Fix Released
Bug description:
The following introduces a 60s timeout to DHCP agent startups:
https://github.com/openstack/neutron/commit/157e09e6af758b7669fbe5a8cdb0b1969f04661a
#diff-3fcbcfeebb7de79a1cb36faed9b8b091
The value is not adjustable from conf.
When there's enough network elements (ie. ~1200 DHCP enabled subnets in our case), nearly all DHCP startups fail with:
2019-10-09 13:21:27.826 694156 ERROR neutron.agent.linux.dhcp [-]
Failed to start DHCP process for network 8b4b5496-8b35-482e-
a2a3-7c352f1e343a: WaitTimeout: Timed out after 60 seconds
Timeout happens due to operations happening in sequence with 100-300ms
between each operation, and too many agents tried at the same time:
2019-10-09 12:24:17.836 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-13648243-5659-4094-9dd7-cee58e4d46ac', 'ip', '-o', 'link', 'show', 'tap3af1ee05-2f'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-10-09 12:24:18.075 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1407a320-8fca-4dfd-a011-96a2ad41779f', 'ip', '-o', 'link', 'show', 'tap21c443e1-89'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-10-09 12:24:18.266 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1412976d-4cd0-452a-91b7-7f8c3003c722', 'ip', '-o', 'link', 'show', 'tap1cb0995e-bd'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-10-09 12:24:18.541 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1433d8a6-fa06-4544-994a-d38b01302490', 'ip', '-o', 'link', 'show', 'tap00fcd6f1-b1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-10-09 12:24:18.735 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-1447a8cf-e94c-4250-b9a8-2b13c0cf60c6', 'ip', '-o', 'link', 'show', 'tap91076dbc-83'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
2019-10-09 12:24:18.930 239392 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-14b396e1-d561-4087-990d-9b993cc08619', 'ip', '-o', 'link', 'show', 'tapdf767a28-af'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:103
The following allows the agents to start:
- common_utils.wait_until_true(self._enable)
+ common_utils.wait_until_true(self._enable, timeout=300)
Few ways to solve this issue:
- Increase default timeout from 60s to a bigger number
- Make the timeout dhcp conf adjustable
- Figure out more optimal batch sizes of DHCP agents to be started at a time, and increase the startup performance
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1849676/+subscriptions
References