← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1787977] Re: Inefficient multi-cell instance list

 

Reviewed:  https://review.openstack.org/593131
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c3a77f80b1863e114109af9c32ea01b205c1a735
Submitter: Zuul
Branch:    master

commit c3a77f80b1863e114109af9c32ea01b205c1a735
Author: Dan Smith <dansmith@xxxxxxxxxx>
Date:   Fri Aug 17 07:56:05 2018 -0700

    Make instance_list perform per-cell batching
    
    This makes the instance_list module support batching across cells
    with a couple of different strategies, and with room to add more
    in the future.
    
    Before this change, an instance list with limit 1000 to a
    deployment with 10 cells would generate a query to each cell
    database with the same limit. Thus, that API request could end
    up processing up to 10,000 instance records despite only
    returning 1000 to the user (because of the limit).
    
    This uses the batch functionality in the base code added in
    Iaa4759822e70b39bd735104d03d4deec988d35a1
    by providing a couple of strategies by which the batch size
    per cell can be determined. These should provide a lot of gain
    in the short term, and we can extend them with other strategies
    as we identify some with additional benefits.
    
    Closes-Bug: #1787977
    Change-Id: Ie3a5f5dc49f8d9a4b96f1e97f8a6ea0b5738b768


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1787977

Title:
  Inefficient multi-cell instance list

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New

Bug description:
  This is based on some performance and scale testing done by Huawei,
  reported in this dev ML thread:

  http://lists.openstack.org/pipermail/openstack-
  dev/2018-August/133363.html

  In that scenario, they have 10 cells with 10000 instances in each
  cell. They then run through a few GET /servers/detail scenarios with
  multiple cells and varying limits.

  The thread discussion pointed out that they were wasting time pulling
  1000 records (the default [api]/max_limit) from all 10 cells and then
  throwing away 9000 of those results, so the DB query time per cell was
  small, but the sqla/ORM/python was chewing up the time.

  Dan Smith has a series of changes here:

  https://review.openstack.org/#/q/topic:batched-inst-
  list+(status:open+OR+status:merged)

  Which allow us to batch the DB queries per cell which, when
  distributed across the 10 cells, e.g. 1000 / 10 = 100 batch size per
  cell, ends up cutting the time spent in about half (around 11 sec to
  around 6 sec).

  This is clearly a performance issue which we have a fix, and we
  arguably should backport the fix.

  Note this is less of an issue for deployments that leverage the
  [api]/instance_list_per_project_cells option (like CERN):

  https://docs.openstack.org/nova/latest/configuration/config.html#api.instance_list_per_project_cells

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1787977/+subscriptions


References