yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #12770
[Bug 1293083] Re: report_interval too frequent; Causing load on service, failing high CPU usage operations
** Changed in: neutron
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1293083
Title:
report_interval too frequent; Causing load on service, failing high
CPU usage operations
Status in OpenStack Neutron (virtual network service):
Fix Released
Bug description:
report_interval is how often an agent sends out a heartbeat to the
service. The Neutron service responds to these 'report_state' RPC
messages by updating the agent's heartbeat DB record. The last
heartbeat is then compared to the configured agent_down_time to
determine if the agent is up or down. The agent's status is used when
scheduling networks on DHCP and L3 agents.
The defaults are 4 seconds for report_interval and 9 for
agent_down_time.
On a setup with 18 agents (15 layer 2, L3, DHCP, metadata) sitting on
16 nodes, and a Neutron service sitting on a dedicated powerful
machine, the service was idle with 20% CPU usage. Changing the
report_interval to 28 seconds and agent_down_time to 60 seconds
changed the CPU usage to 1%, and allowed bulk operations on a larger
scale. (In this case: Creating 30 instances at the same time with 60
ports). With the original values the operation failed (The instances
did not get IP addresses), and with the new values we were able to
boot 60 instances successfully. Side note: This flow will work better
once the Nova-Neutron race is resolved, but that's orthogonal to this
proposal.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1293083/+subscriptions
References