yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #59585
[Bug 1648206] Re: sriov agent report_state is slow
Reviewed: https://review.openstack.org/408281
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1a2a71baf3904209679fc5448814a0e7940fe44d
Submitter: Jenkins
Branch: master
commit 1a2a71baf3904209679fc5448814a0e7940fe44d
Author: Kevin Benton <kevin@xxxxxxxxxx>
Date: Wed Dec 7 11:33:46 2016 -0800
SRIOV: don't block report_state with device count
The device count process can be quite slow on a system with
lots of interfaces. Doing this during report_state can block
it long enough that the agent will be reported as dead and
bindings will fail.
This adjusts the logic to only update the configuration during
the normal device retrieval for the scan loop. This will leave
the report_state loop unblocked by the operation so the agent
doesn't get reported as dead (which blocks port binding).
Closes-Bug: #1648206
Change-Id: Iff45fb6617974b1eceeed238a8d9e958f874f12b
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1648206
Title:
sriov agent report_state is slow
Status in neutron:
Fix Released
Bug description:
On a system with lots of VFs and PFs we get these logs:
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 29.67 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 45.43 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 47.64 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 23.89 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 30.20 sec
Depending on the agent_down_time configuration, this can cause the Neutron server to think the agent has died.
This appears to be caused by blocking on the eswitch manager every time to get a device count to include in the state report.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1648206/+subscriptions
References