yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94654
[Bug 2083226] [NEW] [scale] Adding a public external network to a router is killing database
Public bug reported:
Context
=======
OpenStack Bocat (but master seems affected by this as well).
OVS based deployment.
L3 routers in DVR and HA mode.
One big public "external/public" network (with subnets like /21 or /22) used by instances and router external gateways.
Problem description
===================
When adding a port on a router in HA+DVR, neutron api may send a lot of RPC messages toward L3 agents, depending on the size of the subnet used for the gateway.
How to reproduce
================
Add a port on a router:
$ openstack port create --network public pub
$ openstack router add port router-arnaud pub
On neutron server, in logs (in DEBUG):
Notify agent at l3_agent.hostxyz
We see this line for all l3 agents having a port in public
network/subnet (which can be huge, like 1k).
Then, all agents are doing another RPC call (sync_routers) which is ending on neutron-rpc with this log line:
Sync routers for ids [abc]
Behind the Sync router, some big SQL request are done (e.g. in
l3_dvrscheduler_db.py / _get_dvr_subnet_ids_on_host_query)
When 1k requests like this are done, on each router update, the database
is killed by too much SQL requests to do.
The dvr router is then configured by l3 agent on all the computes, but
this is never used (the public network is an external one and does not
rely on routers to be accessible).
We have two options:
- prevent adding a port from an external network inside a router (it should be used only for routers gateway), or
- stop flooding the creation of dvr routers in such situation
Note, this is pretty much the same scenario as the one described in #1992950
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2083226
Title:
[scale] Adding a public external network to a router is killing
database
Status in neutron:
New
Bug description:
Context
=======
OpenStack Bocat (but master seems affected by this as well).
OVS based deployment.
L3 routers in DVR and HA mode.
One big public "external/public" network (with subnets like /21 or /22) used by instances and router external gateways.
Problem description
===================
When adding a port on a router in HA+DVR, neutron api may send a lot of RPC messages toward L3 agents, depending on the size of the subnet used for the gateway.
How to reproduce
================
Add a port on a router:
$ openstack port create --network public pub
$ openstack router add port router-arnaud pub
On neutron server, in logs (in DEBUG):
Notify agent at l3_agent.hostxyz
We see this line for all l3 agents having a port in public
network/subnet (which can be huge, like 1k).
Then, all agents are doing another RPC call (sync_routers) which is ending on neutron-rpc with this log line:
Sync routers for ids [abc]
Behind the Sync router, some big SQL request are done (e.g. in
l3_dvrscheduler_db.py / _get_dvr_subnet_ids_on_host_query)
When 1k requests like this are done, on each router update, the
database is killed by too much SQL requests to do.
The dvr router is then configured by l3 agent on all the computes, but
this is never used (the public network is an external one and does not
rely on routers to be accessible).
We have two options:
- prevent adding a port from an external network inside a router (it should be used only for routers gateway), or
- stop flooding the creation of dvr routers in such situation
Note, this is pretty much the same scenario as the one described in #1992950
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2083226/+subscriptions