← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1991817] [NEW] OVN metadata agent liveness system generate OVN SBDB usage peak

 

Public bug reported:

On bigger scale deployments (150+ compute hosts) neutron-ovn-metadata-
agent liveness system generates CPU usage peak on OVN Southbound DB
system every period of time (agent_down_time / 2). This CPU saturation
time can takes dozens of seconds and it introduces a significant latency
in OVN service response.

Problem is that every neutron-ovn-metadata-agent is instantly responding on event on SB_Global table and updates it's corresponding Chassis/Chassis_Private table external_ids property.
That generate flood of OVN SBDB updates.

Similar issue can be observed on different neutron agents that are using
oslo.messaging system to deliver it's heartbeats (like neutron ovs
agent) but in those cases the load generated by liveness system can be
distributed in time just by different agent execution time.

neutron-ovn-metadata-agent heartbeat does not rely on the agent execute
time but is triggered by general OVN event.

Solution could be to distribute neutron-ovn-metadata-agent heartbeat
update time just by postponing it's answer in randomized period of time
(where delay time range is not exceeding agent_down_time / 2 parameter).

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1991817

Title:
  OVN metadata agent liveness system generate OVN SBDB usage peak

Status in neutron:
  New

Bug description:
  On bigger scale deployments (150+ compute hosts) neutron-ovn-metadata-
  agent liveness system generates CPU usage peak on OVN Southbound DB
  system every period of time (agent_down_time / 2). This CPU saturation
  time can takes dozens of seconds and it introduces a significant
  latency in OVN service response.

  Problem is that every neutron-ovn-metadata-agent is instantly responding on event on SB_Global table and updates it's corresponding Chassis/Chassis_Private table external_ids property.
  That generate flood of OVN SBDB updates.

  Similar issue can be observed on different neutron agents that are
  using oslo.messaging system to deliver it's heartbeats (like neutron
  ovs agent) but in those cases the load generated by liveness system
  can be distributed in time just by different agent execution time.

  neutron-ovn-metadata-agent heartbeat does not rely on the agent
  execute time but is triggered by general OVN event.

  Solution could be to distribute neutron-ovn-metadata-agent heartbeat
  update time just by postponing it's answer in randomized period of
  time (where delay time range is not exceeding agent_down_time / 2
  parameter).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1991817/+subscriptions