yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89809
[Bug 1991817] [NEW] OVN metadata agent liveness system generate OVN SBDB usage peak
Public bug reported:
On bigger scale deployments (150+ compute hosts) neutron-ovn-metadata-
agent liveness system generates CPU usage peak on OVN Southbound DB
system every period of time (agent_down_time / 2). This CPU saturation
time can takes dozens of seconds and it introduces a significant latency
in OVN service response.
Problem is that every neutron-ovn-metadata-agent is instantly responding on event on SB_Global table and updates it's corresponding Chassis/Chassis_Private table external_ids property.
That generate flood of OVN SBDB updates.
Similar issue can be observed on different neutron agents that are using
oslo.messaging system to deliver it's heartbeats (like neutron ovs
agent) but in those cases the load generated by liveness system can be
distributed in time just by different agent execution time.
neutron-ovn-metadata-agent heartbeat does not rely on the agent execute
time but is triggered by general OVN event.
Solution could be to distribute neutron-ovn-metadata-agent heartbeat
update time just by postponing it's answer in randomized period of time
(where delay time range is not exceeding agent_down_time / 2 parameter).
** Affects: neutron
Importance: Undecided
Status: New
** Tags: ovn
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1991817
Title:
OVN metadata agent liveness system generate OVN SBDB usage peak
Status in neutron:
New
Bug description:
On bigger scale deployments (150+ compute hosts) neutron-ovn-metadata-
agent liveness system generates CPU usage peak on OVN Southbound DB
system every period of time (agent_down_time / 2). This CPU saturation
time can takes dozens of seconds and it introduces a significant
latency in OVN service response.
Problem is that every neutron-ovn-metadata-agent is instantly responding on event on SB_Global table and updates it's corresponding Chassis/Chassis_Private table external_ids property.
That generate flood of OVN SBDB updates.
Similar issue can be observed on different neutron agents that are
using oslo.messaging system to deliver it's heartbeats (like neutron
ovs agent) but in those cases the load generated by liveness system
can be distributed in time just by different agent execution time.
neutron-ovn-metadata-agent heartbeat does not rely on the agent
execute time but is triggered by general OVN event.
Solution could be to distribute neutron-ovn-metadata-agent heartbeat
update time just by postponing it's answer in randomized period of
time (where delay time range is not exceeding agent_down_time / 2
parameter).
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1991817/+subscriptions