yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74185
[Bug 1786055] [NEW] performance degradation in placement with large number of resource providers
Public bug reported:
Using today's master, there is a big performance degradation in GET
/allocation_candidates when there is a large number of resource
providers (in my tests 1000, each with the same inventory as described
in [1]). 17s when querying all three resource classes with
http://127.0.0.1:8081/allocation_candidates?resources=VCPU:1,MEMORY_MB:256,DISK_GB:10
Using a limit does not make any difference, the cost is in generating
the original data.
I did some advanced LOG.debug based benchmarking to determine three
places where things are a problem, and maybe even fixed the worst one.
See the diff below. The two main culprits are
ResourceProvider.get_by_uuid calls looping over the full set. These can
be replaced by either using data we already have from early queries, or
by changing so we are making single queries.
In the diff I've already changed one of them (the second chunk) to use
the data that _build_provider_summaries is already getting. (functional
tests still pass with this change)
The third chunk is because we have a big loop, but I suspect there is
some duplication that can be avoided. I have no investigated that
closely (yet).
-=-=-
diff --git a/nova/api/openstack/placement/objects/resource_provider.py b/nova/api/openstack/placement/objects/resource_provider.py
index 851f9719e4..e6c894b8fe 100644
--- a/nova/api/openstack/placement/objects/resource_provider.py
+++ b/nova/api/openstack/placement/objects/resource_provider.py
@@ -3233,6 +3233,8 @@ def _build_provider_summaries(context, usages, prov_traits):
if not summary:
summary = ProviderSummary(
context,
+ # This is _expensive_ when there are a large number of rps.
+ # Building the objects differently may be better.
resource_provider=ResourceProvider.get_by_uuid(context,
uuid=rp_uuid),
resources=[],
@@ -3519,8 +3521,7 @@ def _alloc_candidates_multiple_providers(ctx, requested_resources,
rp_uuid = rp_summary.resource_provider.uuid
tree_dict[root_id][rc_id].append(
AllocationRequestResource(
- ctx, resource_provider=ResourceProvider.get_by_uuid(ctx,
- rp_uuid),
+ ctx, resource_provider=rp_summary.resource_provider,
resource_class=_RC_CACHE.string_from_id(rc_id),
amount=requested_resources[rc_id]))
@@ -3535,6 +3536,8 @@ def _alloc_candidates_multiple_providers(ctx, requested_resources,
alloc_prov_ids = []
# Let's look into each tree
+ # With many resource providers this takes a long time, but each trip
+ # through the loop is not too bad.
for root_id, alloc_dict in tree_dict.items():
# Get request_groups, which is a list of lists of
# AllocationRequestResource(ARR) per requested resource class(rc).
-=-=-
[1]
https://github.com/cdent/placeload/blob/master/placeload/__init__.py#L23
** Affects: nova
Importance: High
Status: Confirmed
** Tags: placement
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1786055
Title:
performance degradation in placement with large number of resource
providers
Status in OpenStack Compute (nova):
Confirmed
Bug description:
Using today's master, there is a big performance degradation in GET
/allocation_candidates when there is a large number of resource
providers (in my tests 1000, each with the same inventory as described
in [1]). 17s when querying all three resource classes with
http://127.0.0.1:8081/allocation_candidates?resources=VCPU:1,MEMORY_MB:256,DISK_GB:10
Using a limit does not make any difference, the cost is in generating
the original data.
I did some advanced LOG.debug based benchmarking to determine three
places where things are a problem, and maybe even fixed the worst one.
See the diff below. The two main culprits are
ResourceProvider.get_by_uuid calls looping over the full set. These
can be replaced by either using data we already have from early
queries, or by changing so we are making single queries.
In the diff I've already changed one of them (the second chunk) to use
the data that _build_provider_summaries is already getting.
(functional tests still pass with this change)
The third chunk is because we have a big loop, but I suspect there is
some duplication that can be avoided. I have no investigated that
closely (yet).
-=-=-
diff --git a/nova/api/openstack/placement/objects/resource_provider.py b/nova/api/openstack/placement/objects/resource_provider.py
index 851f9719e4..e6c894b8fe 100644
--- a/nova/api/openstack/placement/objects/resource_provider.py
+++ b/nova/api/openstack/placement/objects/resource_provider.py
@@ -3233,6 +3233,8 @@ def _build_provider_summaries(context, usages, prov_traits):
if not summary:
summary = ProviderSummary(
context,
+ # This is _expensive_ when there are a large number of rps.
+ # Building the objects differently may be better.
resource_provider=ResourceProvider.get_by_uuid(context,
uuid=rp_uuid),
resources=[],
@@ -3519,8 +3521,7 @@ def _alloc_candidates_multiple_providers(ctx, requested_resources,
rp_uuid = rp_summary.resource_provider.uuid
tree_dict[root_id][rc_id].append(
AllocationRequestResource(
- ctx, resource_provider=ResourceProvider.get_by_uuid(ctx,
- rp_uuid),
+ ctx, resource_provider=rp_summary.resource_provider,
resource_class=_RC_CACHE.string_from_id(rc_id),
amount=requested_resources[rc_id]))
@@ -3535,6 +3536,8 @@ def _alloc_candidates_multiple_providers(ctx, requested_resources,
alloc_prov_ids = []
# Let's look into each tree
+ # With many resource providers this takes a long time, but each trip
+ # through the loop is not too bad.
for root_id, alloc_dict in tree_dict.items():
# Get request_groups, which is a list of lists of
# AllocationRequestResource(ARR) per requested resource class(rc).
-=-=-
[1] https://github.com/cdent/placeload/blob/master/placeload/__init__.py#L23
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1786055/+subscriptions
Follow ups