← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1918145] [NEW] Slownesses on neutron API with many RBAC rules

 

Public bug reported:

* Summary: Slownesses on neutron API with many RBAC rules

* High level description: Sharing several networks or security groups to
project drastically increase API response time on some routes (/networks
or /server/detail).

For quite some time we have observing that reponse times are increasing
(slowly fur surely) on /networks calls. We have increased the number of
Neutron workers, but in vain.

Lately, we're observing that it's getting worse (reponse time form 5 to 370 seconds). We discarded possible bottlenecks one by one (our service endpoint performance, neutron API configuration, etc). 
But we have found that some calls in the DB takes a lot of time. It seems they are stuck in the mariadb database (10.3.10). So we have captured a slow queries in mysql.

An example of for /server/detail:
---------------------------------
http://paste.openstack.org/show/803334/

We can see that there are more than 2 millions of rows examinated, and
around 1657 returned.

An example of for /networks:
----------------------------
http://paste.openstack.org/show/803337/
Rows_sent: 517  Rows_examined: 223519

* Pre-conditions: 
Database tables size:
table: 
    -   networkrbacs 16928 rows
    -   securitygrouprbacs 1691 rows
    -   keystone.project 1713 rows

Control plane nodes are shared with some others services:
- RMQ
- mariadb
- Openstack APIs
- DHCP agents

It seems the code of those lines are based on
https://github.com/openstack/neutron-
lib/blob/698e4c8daa7d43018a71122ec5b0cd5b17b55141/neutron_lib/db/model_query.py#L120

* Step-by-step reproduction steps:

- Create a lot of projects (at least 1000)
- Create a SG in admin account
- Create fake networks (vlan, vxlan) with associated 
- Share the SG and all networks with all projects


* Expected output: lower response time, less than 5 seconds (approximatively).

* Actual output: May lead to gateway timeout.

* Version:
  ** OpenStack version Stein releases for all components
  ** CentOS 7.4 with kolla containers
  ** kolla-ansible for stein release

* Environment: We operate all services in Openstack except for Cinder.

* Perceived severity: Medium

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1918145

Title:
  Slownesses on neutron API with many RBAC rules

Status in neutron:
  New

Bug description:
  * Summary: Slownesses on neutron API with many RBAC rules

  * High level description: Sharing several networks or security groups
  to project drastically increase API response time on some routes
  (/networks or /server/detail).

  For quite some time we have observing that reponse times are
  increasing (slowly fur surely) on /networks calls. We have increased
  the number of Neutron workers, but in vain.

  Lately, we're observing that it's getting worse (reponse time form 5 to 370 seconds). We discarded possible bottlenecks one by one (our service endpoint performance, neutron API configuration, etc). 
  But we have found that some calls in the DB takes a lot of time. It seems they are stuck in the mariadb database (10.3.10). So we have captured a slow queries in mysql.

  An example of for /server/detail:
  ---------------------------------
  http://paste.openstack.org/show/803334/

  We can see that there are more than 2 millions of rows examinated, and
  around 1657 returned.

  An example of for /networks:
  ----------------------------
  http://paste.openstack.org/show/803337/
  Rows_sent: 517  Rows_examined: 223519

  * Pre-conditions: 
  Database tables size:
  table: 
      -   networkrbacs 16928 rows
      -   securitygrouprbacs 1691 rows
      -   keystone.project 1713 rows

  Control plane nodes are shared with some others services:
  - RMQ
  - mariadb
  - Openstack APIs
  - DHCP agents

  It seems the code of those lines are based on
  https://github.com/openstack/neutron-
  lib/blob/698e4c8daa7d43018a71122ec5b0cd5b17b55141/neutron_lib/db/model_query.py#L120

  * Step-by-step reproduction steps:

  - Create a lot of projects (at least 1000)
  - Create a SG in admin account
  - Create fake networks (vlan, vxlan) with associated 
  - Share the SG and all networks with all projects

  
  * Expected output: lower response time, less than 5 seconds (approximatively).

  * Actual output: May lead to gateway timeout.

  * Version:
    ** OpenStack version Stein releases for all components
    ** CentOS 7.4 with kolla containers
    ** kolla-ansible for stein release

  * Environment: We operate all services in Openstack except for Cinder.

  * Perceived severity: Medium

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1918145/+subscriptions


Follow ups