← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1918145] Re: Slownesses on neutron API with many RBAC rules

 

I think one of the first step that we can have is to remove the ORDER BY
as it creates the temporary filesort that you have mentioned in #9.

I may missing something, an order by UUID does not bring any kind value?

A second step would be to understand why the possible key object_id is
not used.

There is also another point, we can notice that we do filter per action,
but I think that we do not have an index on it, maybe we could also
investigate that point.


** Changed in: neutron
       Status: Fix Released => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1918145

Title:
  Slownesses on neutron API with many RBAC rules

Status in neutron:
  Confirmed

Bug description:
  * Summary: Slownesses on neutron API with many RBAC rules

  * High level description: Sharing several networks or security groups
  to project drastically increase API response time on some routes
  (/networks or /server/detail).

  For quite some time we have observing that reponse times are
  increasing (slowly fur surely) on /networks calls. We have increased
  the number of Neutron workers, but in vain.

  Lately, we're observing that it's getting worse (reponse time form 5 to 370 seconds). We discarded possible bottlenecks one by one (our service endpoint performance, neutron API configuration, etc).
  But we have found that some calls in the DB takes a lot of time. It seems they are stuck in the mariadb database (10.3.10). So we have captured a slow queries in mysql.

  An example of for /server/detail:
  ---------------------------------
  http://paste.openstack.org/show/803334/

  We can see that there are more than 2 millions of rows examinated, and
  around 1657 returned.

  An example of for /networks:
  ----------------------------
  http://paste.openstack.org/show/803337/
  Rows_sent: 517  Rows_examined: 223519

  * Pre-conditions:
  Database tables size:
  table:
      -   networkrbacs 16928 rows
      -   securitygrouprbacs 1691 rows
      -   keystone.project 1713 rows

  Control plane nodes are shared with some others services:
  - RMQ
  - mariadb
  - Openstack APIs
  - DHCP agents

  It seems the code of those lines are based on
  https://github.com/openstack/neutron-
  lib/blob/698e4c8daa7d43018a71122ec5b0cd5b17b55141/neutron_lib/db/model_query.py#L120

  * Step-by-step reproduction steps:

  - Create a lot of projects (at least 1000)
  - Create a SG in admin account
  - Create fake networks (vlan, vxlan) with associated
  - Share the SG and all networks with all projects

  * Expected output: lower response time, less than 5 seconds
  (approximatively).

  * Actual output: May lead to gateway timeout.

  * Version:
    ** OpenStack version Stein releases for all components (neutron 14.2.0).
    ** CentOS 7.4 with kolla containers
    ** kolla-ansible for stein release

  * Environment: We operate all services in Openstack except for Cinder.

  * Perceived severity: Medium

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1918145/+subscriptions



References