← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1918145] Re: Slownesses on neutron API with many RBAC rules

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/884877
Committed: https://opendev.org/openstack/neutron/commit/e9da29d16c474822c015996cf34e40005419146a
Submitter: "Zuul (22348)"
Branch:    master

commit e9da29d16c474822c015996cf34e40005419146a
Author: Rodolfo Alonso Hernandez <ralonsoh@xxxxxxxxxx>
Date:   Sun May 28 17:28:03 2023 +0200

    Change RBAC relationship loading method to "joined"
    
    This patch changes all RBAC relationship method to "joined". This change
    enforces that the RBAC associated registers are loaded along with the
    parent resource. The rationale of this change is to be able to control
    the SQL query executed; the subquery cannot be directly managed by
    Neutron.
    
    It is very usual to create the RBAC rules from one single project that
    is usually the adminitrator project. That means all RBAC rules will
    belong to it. Before this change, the SQL subquery performed to
    retrieve the RBAC entries was this (from a network query):
    
      SELECT networks.id AS networks_id
      FROM networks LEFT OUTER JOIN networkrbacs ON networks.id =
      networkrbacs.object_id
      WHERE networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action = 'access_as_external'
        AND networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*'
        OR networks.project_id = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.action IN ('access_as_shared', 'access_as_readonly')
        AND (networkrbacs.target_project = 'bd133e2c499c4bf8aeb16206e31c3c20'
        OR networkrbacs.target_project = '*');
    
    This SQL result has a very low cardinality; that means there are many
    duplicated registers. For example, with 10 external network, 1000
    projects and 2500 RBAC rules, this query returns 1.4 million rows.
    Instead if a "GROUP BY resource_id" (in this case network_id) clause is
    added, the number of rows is reduced to 10 (considering this project
    has a RBAC per network).
    
    In order to introduce this "GROUP BY" clause, this patch is changing
    the loading method. The clause is added in a neutron-lib patch [1].
    
    This change by itself does not improve the query performance. The
    neutron-lib patch is needed too. Although this patch does not modify
    que SQL query results, the tests added will prove that the neutron-lib
    patch does not introduce any regression.
    
    [1]https://review.opendev.org/c/openstack/neutron-lib/+/884878
    
    Closes-Bug: #1918145
    Change-Id: Ic6001bd5a57493b8befdf81a41eb0bd1c8022df3


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1918145

Title:
  Slownesses on neutron API with many RBAC rules

Status in neutron:
  Fix Released

Bug description:
  * Summary: Slownesses on neutron API with many RBAC rules

  * High level description: Sharing several networks or security groups
  to project drastically increase API response time on some routes
  (/networks or /server/detail).

  For quite some time we have observing that reponse times are
  increasing (slowly fur surely) on /networks calls. We have increased
  the number of Neutron workers, but in vain.

  Lately, we're observing that it's getting worse (reponse time form 5 to 370 seconds). We discarded possible bottlenecks one by one (our service endpoint performance, neutron API configuration, etc).
  But we have found that some calls in the DB takes a lot of time. It seems they are stuck in the mariadb database (10.3.10). So we have captured a slow queries in mysql.

  An example of for /server/detail:
  ---------------------------------
  http://paste.openstack.org/show/803334/

  We can see that there are more than 2 millions of rows examinated, and
  around 1657 returned.

  An example of for /networks:
  ----------------------------
  http://paste.openstack.org/show/803337/
  Rows_sent: 517  Rows_examined: 223519

  * Pre-conditions:
  Database tables size:
  table:
      -   networkrbacs 16928 rows
      -   securitygrouprbacs 1691 rows
      -   keystone.project 1713 rows

  Control plane nodes are shared with some others services:
  - RMQ
  - mariadb
  - Openstack APIs
  - DHCP agents

  It seems the code of those lines are based on
  https://github.com/openstack/neutron-
  lib/blob/698e4c8daa7d43018a71122ec5b0cd5b17b55141/neutron_lib/db/model_query.py#L120

  * Step-by-step reproduction steps:

  - Create a lot of projects (at least 1000)
  - Create a SG in admin account
  - Create fake networks (vlan, vxlan) with associated
  - Share the SG and all networks with all projects

  * Expected output: lower response time, less than 5 seconds
  (approximatively).

  * Actual output: May lead to gateway timeout.

  * Version:
    ** OpenStack version Stein releases for all components (neutron 14.2.0).
    ** CentOS 7.4 with kolla containers
    ** kolla-ansible for stein release

  * Environment: We operate all services in Openstack except for Cinder.

  * Perceived severity: Medium

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1918145/+subscriptions



References