← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2009509] [NEW] Large number of FIPs and subnets causes slow sync_routers response

 

Public bug reported:

When a large number of subnets and FIPs are configured on a network, the
response time for neutron.api.rpc.handlers.l3_rpc.syncer_routers
increases significantly.

Based on profiling data, a large amount of time is spent waiting on
_get_sync_floating_ips
(https://opendev.org/openstack/neutron/src/commit/0a214b0437874fd7f5379ec94fd07ef5d3ff4bbe/neutron/db/l3_db.py#L1879).

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...TRUNCATED...
16        2    0.000    0.000   19.827    9.913 /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/db/l3_db.py:1873(_get_sync_floating_ips)
...TRUNCATED...

In the above example, the total execution time logged for sync_routers
was 26.645s.

Further investigation reveals that the call to
l3_obj.FloatingIP.get_scoped_floating_ips within this is spending a
large amount of time mapping SQL output to ORM Python objects.

Reproduction steps:
- Setup OpenStack with DVR enabled
- Create a network
- Attach a large number of subnets (the above has 27)
- Create a large number of FIPs and attach them to VMs (the above has around 1000 attached FIPs)
- Restart neutron_l3_agent on a compute node and observe slow response times for the get_routers() RPC

Version:
- OpenStack: Zed
- Kernel/distro: N/A

** Affects: neutron
     Importance: Undecided
     Assignee: Adam (adamoswick)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2009509

Title:
  Large number of FIPs and subnets causes slow sync_routers response

Status in neutron:
  In Progress

Bug description:
  When a large number of subnets and FIPs are configured on a network,
  the response time for neutron.api.rpc.handlers.l3_rpc.syncer_routers
  increases significantly.

  Based on profiling data, a large amount of time is spent waiting on
  _get_sync_floating_ips
  (https://opendev.org/openstack/neutron/src/commit/0a214b0437874fd7f5379ec94fd07ef5d3ff4bbe/neutron/db/l3_db.py#L1879).

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  ...TRUNCATED...
  16        2    0.000    0.000   19.827    9.913 /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/db/l3_db.py:1873(_get_sync_floating_ips)
  ...TRUNCATED...

  In the above example, the total execution time logged for sync_routers
  was 26.645s.

  Further investigation reveals that the call to
  l3_obj.FloatingIP.get_scoped_floating_ips within this is spending a
  large amount of time mapping SQL output to ORM Python objects.

  Reproduction steps:
  - Setup OpenStack with DVR enabled
  - Create a network
  - Attach a large number of subnets (the above has 27)
  - Create a large number of FIPs and attach them to VMs (the above has around 1000 attached FIPs)
  - Restart neutron_l3_agent on a compute node and observe slow response times for the get_routers() RPC

  Version:
  - OpenStack: Zed
  - Kernel/distro: N/A

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2009509/+subscriptions



Follow ups