← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1786226] [NEW] Use sqlalchemy baked query

 

Public bug reported:

I am running rally scenario test create_and_list_ports on a 3 controller
setup(each controller have 8 CPUs i.e 4 cores*2 HTs) with (function call
trace enabled on neutron server processes) a concurrency of 8 for 400
iterations.

Average time taken for create port is 7.207 seconds(when 400 ports are created) and the function call trace  for this run is at http://paste.openstack.org/show/727718/ and rally results are 
+---------------------------------------------------------------------------------------------------------------------------+
|                                                   Response Times (sec)                                                    |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action                 | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| neutron.create_network | 2.085     | 2.491        | 3.01         | 3.29         | 7.558     | 2.611     | 100.0%  | 400   |
| neutron.create_port    | 5.69      | 6.878        | 7.755        | 9.394        | 17.0      | 7.207     | 100.0%  | 400   |
| neutron.list_ports     | 0.72      | 5.552        | 9.123        | 9.599        | 11.165    | 5.559     | 100.0%  | 400   |
| total                  | 10.085    | 15.263       | 18.789       | 19.734       | 28.712    | 15.377    | 100.0%  | 400   |
|  -> duration           | 10.085    | 15.263       | 18.789       | 19.734       | 28.712    | 15.377    | 100.0%  | 400   |
|  -> idle_duration      | 0.0       | 0.0          | 0.0          | 0.0          | 0.0       | 0.0       | 100.0%  | 400   |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+


Michael Bayer (zzzeek) has analysed this callgraph and had some
suggestions. One suggestion is to use baked query i.e
https://review.openstack.org/#/c/430973/2

This is his analysis - "But looking at the profile I see here, it is
clear that the vast majority of time is spent doing lots and lots of
small queries, and all of the mechanics involved with turning them into
SQL strings and invoking them.   SQLAlchemy has a very effective
optimization for this but it must be coded into Neutron.

Here is the total time spent for Query to convert its state into SQL:

148029/356073   15.232    0.000 4583.820    0.013 /usr/lib64/python2.7
/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)

that's 4583 seconds spent in Query compilation, which if Neutron were
modified  to use baked queries, would be vastly reduced.  I demonstrated
the beginning of this work in 2017 here:
https://review.openstack.org/#/c/430973/1  , which illustrates how to
first start to create a base query method in neutron that other
functions can begin to make use of.  As more queries start using the
baked form, this 4500 seconds number will begin to drop."


I have restored his patch https://review.openstack.org/#/c/430973/2 , with this the average time taken to create port is 5.196 seconds (when 400 ports are created), and the function call trace  for this run is at http://paste.openstack.org/show/727719/ also total time spent on Query compilation (Query._compile_context) is only 1675 seconds.

83696/169062    7.308    0.000 1675.140    0.010 /usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)
 
Rally results for this run are

+---------------------------------------------------------------------------------------------------------------------------+
|                                                   Response Times (sec)                                                    |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| Action                 | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
| neutron.create_network | 2.03      | 2.435        | 2.872        | 3.42         | 8.301     | 2.631     | 100.0%  | 400   |
| neutron.create_port    | 4.196     | 4.996        | 5.664        | 6.5          | 12.303    | 5.196     | 100.0%  | 400   |
| neutron.list_ports     | 0.592     | 5.432        | 8.97         | 9.475        | 10.1      | 5.373     | 100.0%  | 400   |
| total                  | 7.165     | 13.265       | 16.544       | 17.296       | 25.041    | 13.201    | 100.0%  | 400   |
|  -> duration           | 7.165     | 13.265       | 16.544       | 17.296       | 25.041    | 13.201    | 100.0%  | 400   |
|  -> idle_duration      | 0.0       | 0.0          | 0.0          | 0.0          | 0.0       | 0.0       | 100.0%  | 400   |
+------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

In https://review.openstack.org/#/c/430973/2 we are using baked query for only get_by_id i.e
https://review.openstack.org/#/c/430973/2/neutron/db/_model_query.py@225

While creating port, plugin will call get_network() which internally
calls get_by_id.

The cumulative time taken for 4800 ncalls of get_network is 3249 seconds

4800    0.325    0.000 3249.537    0.677 /usr/lib/python2.7/site-
packages/neutron/plugins/ml2/plugin.py:928(Ml2Plugin.get_network)

but when get_network is using baked query, this has come down to 1075
seconds.

4800    0.321    0.000 1075.695    0.224 /usr/lib/python2.7/site-
packages/neutron/plugins/ml2/plugin.py:928(Ml2Plugin.get_network)

If we enhance any other neutron DB methods to use baked query, then we
can improve the neutron performance further.

** Affects: neutron
     Importance: Wishlist
     Assignee: venkata anil (anil-venkata)
         Status: Confirmed

** Attachment added: "rally_result_with_baked_query.pdf"
   https://bugs.launchpad.net/bugs/1786226/+attachment/5173292/+files/rally_result_with_baked_query.pdf

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1786226

Title:
  Use sqlalchemy baked query

Status in neutron:
  Confirmed

Bug description:
  I am running rally scenario test create_and_list_ports on a 3
  controller setup(each controller have 8 CPUs i.e 4 cores*2 HTs) with
  (function call trace enabled on neutron server processes) a
  concurrency of 8 for 400 iterations.

  Average time taken for create port is 7.207 seconds(when 400 ports are created) and the function call trace  for this run is at http://paste.openstack.org/show/727718/ and rally results are 
  +---------------------------------------------------------------------------------------------------------------------------+
  |                                                   Response Times (sec)                                                    |
  +------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
  | Action                 | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
  +------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
  | neutron.create_network | 2.085     | 2.491        | 3.01         | 3.29         | 7.558     | 2.611     | 100.0%  | 400   |
  | neutron.create_port    | 5.69      | 6.878        | 7.755        | 9.394        | 17.0      | 7.207     | 100.0%  | 400   |
  | neutron.list_ports     | 0.72      | 5.552        | 9.123        | 9.599        | 11.165    | 5.559     | 100.0%  | 400   |
  | total                  | 10.085    | 15.263       | 18.789       | 19.734       | 28.712    | 15.377    | 100.0%  | 400   |
  |  -> duration           | 10.085    | 15.263       | 18.789       | 19.734       | 28.712    | 15.377    | 100.0%  | 400   |
  |  -> idle_duration      | 0.0       | 0.0          | 0.0          | 0.0          | 0.0       | 0.0       | 100.0%  | 400   |
  +------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+


  Michael Bayer (zzzeek) has analysed this callgraph and had some
  suggestions. One suggestion is to use baked query i.e
  https://review.openstack.org/#/c/430973/2

  This is his analysis - "But looking at the profile I see here, it is
  clear that the vast majority of time is spent doing lots and lots of
  small queries, and all of the mechanics involved with turning them
  into SQL strings and invoking them.   SQLAlchemy has a very effective
  optimization for this but it must be coded into Neutron.

  Here is the total time spent for Query to convert its state into SQL:

  148029/356073   15.232    0.000 4583.820    0.013 /usr/lib64/python2.7
  /site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)

  that's 4583 seconds spent in Query compilation, which if Neutron were
  modified  to use baked queries, would be vastly reduced.  I
  demonstrated the beginning of this work in 2017 here:
  https://review.openstack.org/#/c/430973/1  , which illustrates how to
  first start to create a base query method in neutron that other
  functions can begin to make use of.  As more queries start using the
  baked form, this 4500 seconds number will begin to drop."

  
  I have restored his patch https://review.openstack.org/#/c/430973/2 , with this the average time taken to create port is 5.196 seconds (when 400 ports are created), and the function call trace  for this run is at http://paste.openstack.org/show/727719/ also total time spent on Query compilation (Query._compile_context) is only 1675 seconds.

  83696/169062    7.308    0.000 1675.140    0.010 /usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)
   
  Rally results for this run are

  +---------------------------------------------------------------------------------------------------------------------------+
  |                                                   Response Times (sec)                                                    |
  +------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
  | Action                 | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
  +------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+
  | neutron.create_network | 2.03      | 2.435        | 2.872        | 3.42         | 8.301     | 2.631     | 100.0%  | 400   |
  | neutron.create_port    | 4.196     | 4.996        | 5.664        | 6.5          | 12.303    | 5.196     | 100.0%  | 400   |
  | neutron.list_ports     | 0.592     | 5.432        | 8.97         | 9.475        | 10.1      | 5.373     | 100.0%  | 400   |
  | total                  | 7.165     | 13.265       | 16.544       | 17.296       | 25.041    | 13.201    | 100.0%  | 400   |
  |  -> duration           | 7.165     | 13.265       | 16.544       | 17.296       | 25.041    | 13.201    | 100.0%  | 400   |
  |  -> idle_duration      | 0.0       | 0.0          | 0.0          | 0.0          | 0.0       | 0.0       | 100.0%  | 400   |
  +------------------------+-----------+--------------+--------------+--------------+-----------+-----------+---------+-------+

  In https://review.openstack.org/#/c/430973/2 we are using baked query for only get_by_id i.e
  https://review.openstack.org/#/c/430973/2/neutron/db/_model_query.py@225

  While creating port, plugin will call get_network() which internally
  calls get_by_id.

  The cumulative time taken for 4800 ncalls of get_network is 3249
  seconds

  4800    0.325    0.000 3249.537    0.677 /usr/lib/python2.7/site-
  packages/neutron/plugins/ml2/plugin.py:928(Ml2Plugin.get_network)

  but when get_network is using baked query, this has come down to 1075
  seconds.

  4800    0.321    0.000 1075.695    0.224 /usr/lib/python2.7/site-
  packages/neutron/plugins/ml2/plugin.py:928(Ml2Plugin.get_network)

  If we enhance any other neutron DB methods to use baked query, then we
  can improve the neutron performance further.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1786226/+subscriptions


Follow ups