← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1861092] Re: [OVN] Too frequent agent health-checks causes stress on ovsdb-server

 

Reviewed:  https://review.opendev.org/704530
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=647b7f63f9dafedfa9fb6e09e3d92d66fb512f0b
Submitter: Zuul
Branch:    master

commit 647b7f63f9dafedfa9fb6e09e3d92d66fb512f0b
Author: Lucas Alvares Gomes <lucasagomes@xxxxxxxxx>
Date:   Tue Jan 28 10:46:35 2020 +0000

    [OVN] Add an interval between agents health checks
    
    This patch adds a minimum interval between each agent health checks.
    
    The way OVN checks for the agents liveness is by increasing a value in
    the NB DB and waiting for it to be propagated to the SB DB but, this can
    be costy if done many times too quickly. Therefore, a minimum interval
    between each check is being added.
    
    Closes-Bug: #1861092
    Change-Id: If1f2d97e3a3a17f6744d546b3e8903bde55e83b9
    Signed-off-by: Lucas Alvares Gomes <lucasagomes@xxxxxxxxx>


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1861092

Title:
  [OVN] Too frequent agent health-checks causes stress on ovsdb-server

Status in neutron:
  Fix Released

Bug description:
  Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1795198

  Looks like neutron-server is pinging agents too frequently as per
  what's observed in the logs. nb-cfg being bumped at a non-fixed rate:

  
  For example, in this part of the log I could find 11 updates in less than 2 minutes:

  2020-01-27 12:23:04.247 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49008, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49007) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:23:05.179 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49009, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49008) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:23:32.216 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49010, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49009) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:23:41.248 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49011, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49010) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:23:42.183 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49012, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49011) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:24:09.210 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49013, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49012) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:24:18.252 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49014, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49013) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:24:19.179 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49015, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49014) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:24:46.205 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49016, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49015) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:24:55.254 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49017, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49016) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44
  2020-01-27 12:24:56.177 43567 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: SbGlobalUpdateEvent(events=('update',), table='SB_Global', conditions=None, old_conditions=None) to row=SB_Global(ipsec=False, ssl=[], nb_cfg=49018, options={'mac_prefix': 'b2:64:0d'}, external_ids={}) old=SB_Global(nb_cfg=49017) matches /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/event.py:44

  
  This is triggering too frequent writes from *all* metadata-agents and ovn-controllers in the cloud which creates a lot of traffic. At scale, this can be a problem.

  Imagine a 500 node deployment, with one update per 10 seconds as in
  the example above. That will translate into 1K (1 metadata agent + 1
  ovn-controller per node) write transactions into the SB database every
  10 seconds so 100 transactions per second that trigger a JSON RPC
  command update to every single client into the cloud.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1861092/+subscriptions


References