← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1648206] [NEW] sriov agent report_state is slow

 

Public bug reported:

On a system with lots of VFs and PFs we get these logs:

WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 29.67 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 45.43 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 47.64 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 23.89 sec
WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 30.20 sec


Depending on the agent_down_time configuration, this can cause the Neutron server to think the agent has died.


This appears to be caused by blocking on the eswitch manager every time to get a device count to include in the state report.

** Affects: neutron
     Importance: Undecided
     Assignee: Kevin Benton (kevinbenton)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => Kevin Benton (kevinbenton)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1648206

Title:
  sriov agent report_state is slow

Status in neutron:
  New

Bug description:
  On a system with lots of VFs and PFs we get these logs:

  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 29.67 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 45.43 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 47.64 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 23.89 sec
  WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 30.20 sec

  
  Depending on the agent_down_time configuration, this can cause the Neutron server to think the agent has died.

  
  This appears to be caused by blocking on the eswitch manager every time to get a device count to include in the state report.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1648206/+subscriptions


Follow ups