← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1992950] Re: [scale] Setting a gateway on router is killing database

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/861322
Committed: https://opendev.org/openstack/neutron/commit/c33b47edc77520abcdd7176af1f0ae921bd489b3
Submitter: "Zuul (22348)"
Branch:    master

commit c33b47edc77520abcdd7176af1f0ae921bd489b3
Author: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>
Date:   Thu Oct 13 17:59:54 2022 +0200

    Do not keep gateway port when notifying for router update
    
    On router update, skip subnet of gateway router.
    This will prevent a massive RPC call toward huge number of L3 agents
    (all agents in same big public subnet ID).
    
    As a side effect, it will save CPU time on database because L3 agent
    receiving such events are then doing a RPC call (sync_routers) even if
    router is not used/deployed on this agent.
    
    Closes-Bug: #1992950
    
    Change-Id: Iafa9d43614d528f230cf034103b54f73303ac815
    Signed-off-by: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1992950

Title:
  [scale] Setting a gateway on router is killing database

Status in neutron:
  Fix Released

Bug description:
  Context
  =======
  OpenStack Stein (but master seems affected by this as well).
  OVS based deployment.
  L3 routers in DVR and HA mode.
  One big public "external/public" network (with subnets like /21 or /22) used by instances and router external gateways.

  Problem description
  ===================
  When adding a gateway on a router in HA+DVR, neutron api may send a lot of RPC messages toward L3 agents, depending on the size of the subnet used for the gateway.

  How to reproduce
  ================
  Add a gateway on a router:

  $ openstack router set --external-gateway Ext-Net router-arnaud

  On neutron server, in logs (in DEBUG):

  Notify agent at l3_agent.hostxyz

  We see this line for all l3 agents having a port in Ext-Net subnet
  (which can be huge, like 1k).

  Then, all agents are doing another RPC call (sync_routers) which is ending on neutron-rpc with this log line:
  Sync routers for ids [abc]

  Behing the Sync router, a big SQL request is done [1]

  When 1k requests like this are done, on each router update, the
  database is killed by too much SQL requests to do.

  
  [1] https://github.com/openstack/neutron/blob/stable/stein/neutron/db/l3_dvrscheduler_db.py#L363

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1992950/+subscriptions



References