yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1750890] [NEW] Neutron db performance at scale

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Leon Zachery <lzachery@xxxxxxxxx>
Date: Wed, 21 Feb 2018 19:18:34 -0000
Reply-to: Bug 1750890 <1750890@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

OpenStack Neutron (like OpenStack) relies on SQL Alcehmy and its ORM for
database support. From our observations, Neutron is not utilizing the
ORM models directly, but rather inserting an additional model layer
above SQLAlchmeny and manually building these models from a number of
underlying DB models. We ran into significant performance issues due to
the increased number of queries at large scale. <Scale numbers to be
added here in the future.>

For ports the problem starts here
https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_common.py#L202-L219.
The base dict is built from a single DB query row and then the
processing of all extensions (which is the default behaviour) leads to a
sequential series of additional queries per row to augment the dict. In
our opinion, this causes issues from a performance perspective, it leads
to the classic n+1 query anti-pattern and fundamentally does not scale
(an alternate option would be to do a “joined” query with active
extensions). This illustrates the type of workarounds that result from
this approach
https://github.com/openstack/neutron/blob/master/neutron/db/_utils.py#L95-L107.
Instead of using native SQL to filter fields from the result the whole
result reset has to be iterated to filter out fields, again surely this
is an anti-pattern when processing DB objects.

With respect to LBaaS support, we removed the intermediate model layer
with this (and a couple of previous) commit(s) https://github.com/sapcc
/neutron-lbaas/commit/f71867fbf6c8a27df43aaff6046948dce60f3081. This is
just an interim change but after implementing this we saw LBAAS API
requests going from > 1-5 minutes and degrading with # of objects to a
consistent sub second response time.

Version:
This is/should be present in all versions, but our testing has been done in Mitaka and above.

** Affects: neutron
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1750890

Title:
Neutron db performance at scale

Status in neutron:
New

Bug description:
OpenStack Neutron (like OpenStack) relies on SQL Alcehmy and its ORM
for database support. From our observations, Neutron is not utilizing
the ORM models directly, but rather inserting an additional model
layer above SQLAlchmeny and manually building these models from a
number of underlying DB models. We ran into significant performance
issues due to the increased number of queries at large scale. <Scale
numbers to be added here in the future.>

For ports the problem starts here
https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_common.py#L202-L219.
The base dict is built from a single DB query row and then the
processing of all extensions (which is the default behaviour) leads to
a sequential series of additional queries per row to augment the dict.
In our opinion, this causes issues from a performance perspective, it
leads to the classic n+1 query anti-pattern and fundamentally does not
scale (an alternate option would be to do a “joined” query with active
extensions). This illustrates the type of workarounds that result
from this approach
https://github.com/openstack/neutron/blob/master/neutron/db/_utils.py#L95-L107.
Instead of using native SQL to filter fields from the result the whole
result reset has to be iterated to filter out fields, again surely
this is an anti-pattern when processing DB objects.

With respect to LBaaS support, we removed the intermediate model layer
with this (and a couple of previous) commit(s)
https://github.com/sapcc/neutron-
lbaas/commit/f71867fbf6c8a27df43aaff6046948dce60f3081. This is just
an interim change but after implementing this we saw LBAAS API
requests going from > 1-5 minutes and degrading with # of objects to a
consistent sub second response time.

Version:
This is/should be present in all versions, but our testing has been done in Mitaka and above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1750890/+subscriptions

Follow ups

[Bug 1750890] Re: Neutron db performance at scale
From: Ihar Hrachyshka, 2018-03-15