yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #89875
[Bug 1992950] Re: [scale] Setting a gateway on router is killing database
Reviewed: https://review.opendev.org/c/openstack/neutron/+/861322
Committed: https://opendev.org/openstack/neutron/commit/c33b47edc77520abcdd7176af1f0ae921bd489b3
Submitter: "Zuul (22348)"
Branch: master
commit c33b47edc77520abcdd7176af1f0ae921bd489b3
Author: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>
Date: Thu Oct 13 17:59:54 2022 +0200
Do not keep gateway port when notifying for router update
On router update, skip subnet of gateway router.
This will prevent a massive RPC call toward huge number of L3 agents
(all agents in same big public subnet ID).
As a side effect, it will save CPU time on database because L3 agent
receiving such events are then doing a RPC call (sync_routers) even if
router is not used/deployed on this agent.
Closes-Bug: #1992950
Change-Id: Iafa9d43614d528f230cf034103b54f73303ac815
Signed-off-by: Arnaud Morin <arnaud.morin@xxxxxxxxxxxx>
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1992950
Title:
[scale] Setting a gateway on router is killing database
Status in neutron:
Fix Released
Bug description:
Context
=======
OpenStack Stein (but master seems affected by this as well).
OVS based deployment.
L3 routers in DVR and HA mode.
One big public "external/public" network (with subnets like /21 or /22) used by instances and router external gateways.
Problem description
===================
When adding a gateway on a router in HA+DVR, neutron api may send a lot of RPC messages toward L3 agents, depending on the size of the subnet used for the gateway.
How to reproduce
================
Add a gateway on a router:
$ openstack router set --external-gateway Ext-Net router-arnaud
On neutron server, in logs (in DEBUG):
Notify agent at l3_agent.hostxyz
We see this line for all l3 agents having a port in Ext-Net subnet
(which can be huge, like 1k).
Then, all agents are doing another RPC call (sync_routers) which is ending on neutron-rpc with this log line:
Sync routers for ids [abc]
Behing the Sync router, a big SQL request is done [1]
When 1k requests like this are done, on each router update, the
database is killed by too much SQL requests to do.
[1] https://github.com/openstack/neutron/blob/stable/stein/neutron/db/l3_dvrscheduler_db.py#L363
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1992950/+subscriptions
References