← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2083226] [NEW] [scale] Adding a public external network to a router is killing database

 

Public bug reported:

Context
=======
OpenStack Bocat (but master seems affected by this as well).
OVS based deployment.
L3 routers in DVR and HA mode.
One big public "external/public" network (with subnets like /21 or /22) used by instances and router external gateways.

Problem description
===================
When adding a port on a router in HA+DVR, neutron api may send a lot of RPC messages toward L3 agents, depending on the size of the subnet used for the gateway.

How to reproduce
================
Add a port on a router:

$ openstack port create --network public pub

$ openstack router add port router-arnaud pub

On neutron server, in logs (in DEBUG):

Notify agent at l3_agent.hostxyz

We see this line for all l3 agents having a port in public
network/subnet (which can be huge, like 1k).

Then, all agents are doing another RPC call (sync_routers) which is ending on neutron-rpc with this log line:
Sync routers for ids [abc]

Behind the Sync router, some big SQL request are done (e.g. in
l3_dvrscheduler_db.py / _get_dvr_subnet_ids_on_host_query)

When 1k requests like this are done, on each router update, the database
is killed by too much SQL requests to do.

The dvr router is then configured by l3 agent on all the computes, but
this is never used (the public network is an external one and does not
rely on routers to be accessible).

We have two options:
 - prevent adding a port from an external network inside a router (it should be used only for routers gateway), or
 - stop flooding the creation of dvr routers in such situation


Note, this is pretty much the same scenario as the one described in #1992950

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2083226

Title:
  [scale] Adding a public external network to a router is killing
  database

Status in neutron:
  New

Bug description:
  Context
  =======
  OpenStack Bocat (but master seems affected by this as well).
  OVS based deployment.
  L3 routers in DVR and HA mode.
  One big public "external/public" network (with subnets like /21 or /22) used by instances and router external gateways.

  Problem description
  ===================
  When adding a port on a router in HA+DVR, neutron api may send a lot of RPC messages toward L3 agents, depending on the size of the subnet used for the gateway.

  How to reproduce
  ================
  Add a port on a router:

  $ openstack port create --network public pub

  $ openstack router add port router-arnaud pub

  On neutron server, in logs (in DEBUG):

  Notify agent at l3_agent.hostxyz

  We see this line for all l3 agents having a port in public
  network/subnet (which can be huge, like 1k).

  Then, all agents are doing another RPC call (sync_routers) which is ending on neutron-rpc with this log line:
  Sync routers for ids [abc]

  Behind the Sync router, some big SQL request are done (e.g. in
  l3_dvrscheduler_db.py / _get_dvr_subnet_ids_on_host_query)

  When 1k requests like this are done, on each router update, the
  database is killed by too much SQL requests to do.

  The dvr router is then configured by l3 agent on all the computes, but
  this is never used (the public network is an external one and does not
  rely on routers to be accessible).

  We have two options:
   - prevent adding a port from an external network inside a router (it should be used only for routers gateway), or
   - stop flooding the creation of dvr routers in such situation

  
  Note, this is pretty much the same scenario as the one described in #1992950

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2083226/+subscriptions