yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1787977] [NEW] Inefficient multi-cell instance list

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Mon, 20 Aug 2018 15:39:53 -0000
Reply-to: Bug 1787977 <1787977@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

This is based on some performance and scale testing done by Huawei,
reported in this dev ML thread:

http://lists.openstack.org/pipermail/openstack-
dev/2018-August/133363.html

In that scenario, they have 10 cells with 10000 instances in each cell.
They then run through a few GET /servers/detail scenarios with multiple
cells and varying limits.

The thread discussion pointed out that they were wasting time pulling
1000 records (the default [api]/max_limit) from all 10 cells and then
throwing away 9000 of those results, so the DB query time per cell was
small, but the sqla/ORM/python was chewing up the time.

Dan Smith has a series of changes here:

https://review.openstack.org/#/q/topic:batched-inst-
list+(status:open+OR+status:merged)

Which allow us to batch the DB queries per cell which, when distributed
across the 10 cells, e.g. 1000 / 10 = 100 batch size per cell, ends up
cutting the time spent in about half (around 11 sec to around 6 sec).

This is clearly a performance issue which we have a fix, and we arguably
should backport the fix.

Note this is less of an issue for deployments that leverage the
[api]/instance_list_per_project_cells option (like CERN):

https://docs.openstack.org/nova/latest/configuration/config.html#api.instance_list_per_project_cells

** Affects: nova
     Importance: Medium
     Assignee: Dan Smith (danms)
         Status: Triaged

** Affects: nova/queens
     Importance: Undecided
         Status: New

** Affects: nova/rocky
     Importance: Undecided
         Status: New


** Tags: api cells performance

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1787977

Title:
  Inefficient multi-cell instance list

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New

Bug description:
  This is based on some performance and scale testing done by Huawei,
  reported in this dev ML thread:

  http://lists.openstack.org/pipermail/openstack-
  dev/2018-August/133363.html

  In that scenario, they have 10 cells with 10000 instances in each
  cell. They then run through a few GET /servers/detail scenarios with
  multiple cells and varying limits.

  The thread discussion pointed out that they were wasting time pulling
  1000 records (the default [api]/max_limit) from all 10 cells and then
  throwing away 9000 of those results, so the DB query time per cell was
  small, but the sqla/ORM/python was chewing up the time.

  Dan Smith has a series of changes here:

  https://review.openstack.org/#/q/topic:batched-inst-
  list+(status:open+OR+status:merged)

  Which allow us to batch the DB queries per cell which, when
  distributed across the 10 cells, e.g. 1000 / 10 = 100 batch size per
  cell, ends up cutting the time spent in about half (around 11 sec to
  around 6 sec).

  This is clearly a performance issue which we have a fix, and we
  arguably should backport the fix.

  Note this is less of an issue for deployments that leverage the
  [api]/instance_list_per_project_cells option (like CERN):

  https://docs.openstack.org/nova/latest/configuration/config.html#api.instance_list_per_project_cells

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1787977/+subscriptions

Follow ups

[Bug 1787977] Re: Inefficient multi-cell instance list
From: OpenStack Infra, 2018-08-28