← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1648206] Re: sriov agent report_state is slow

 

Reviewed:  https://review.openstack.org/408281
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1a2a71baf3904209679fc5448814a0e7940fe44d
Submitter: Jenkins
Branch:    master

commit 1a2a71baf3904209679fc5448814a0e7940fe44d
Author: Kevin Benton <kevin@xxxxxxxxxx>
Date:   Wed Dec 7 11:33:46 2016 -0800

    SRIOV: don't block report_state with device count
    
    The device count process can be quite slow on a system with
    lots of interfaces. Doing this during report_state can block
    it long enough that the agent will be reported as dead and
    bindings will fail.
    
    This adjusts the logic to only update the configuration during
    the normal device retrieval for the scan loop. This will leave
    the report_state loop unblocked by the operation so the agent
    doesn't get reported as dead (which blocks port binding).
    
    Closes-Bug: #1648206
    Change-Id: Iff45fb6617974b1eceeed238a8d9e958f874f12b


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1648206

Title:
  sriov agent report_state is slow

Status in neutron:
  Fix Released

Bug description:
  On a system with lots of VFs and PFs we get these logs:

  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 29.67 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 45.43 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 47.64 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 23.89 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 30.20 sec

  
  Depending on the agent_down_time configuration, this can cause the Neutron server to think the agent has died.

  
  This appears to be caused by blocking on the eswitch manager every time to get a device count to include in the state report.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1648206/+subscriptions


References