yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1820283] [NEW] Scheduler Evolution in nova - the doc needs updating

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Fri, 15 Mar 2019 13:34:10 -0000
Reply-to: Bug 1820283 <1820283@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

There are several things in this doc that could be more detailed and/or
updated since they were written.  Some notes:

- The "Cross Project Affinity" section should probably talk about the
[cinder]cross_az_attach config option (and limitations with it) and
things like neutron routed networks. It should also mention that long-
term affinity should be modeled using placement.

- The "Accessing Aggregates in Filters and Weights" section could maybe
be removed since the host aggregate information is cached in the
scheduler, which is mentioned later in "Updating the Scheduler about
other data".

- The "Project Scale issues" section could probably be removed or re-
written to mention that the scheduler should depend more and more on
placement which is it's own project now.

- The "Fixing the Scheduler DB model" section is also probably something
that could be removed; I think the compute_node/services table stuff is
mostly decoupled although there are links yet in the data model, and
again this section is talking about filters hitting the DB for
aggregates which are actually cached (or should be).

- "Versioning Scheduler Placement Interfaces" could probably be removed
since we are no longer passing unversioned dicts over RPC to the
scheduler, we use RequestSpec and Selection objects now.

- The "Sending host and node stats to the scheduler" section is a bit
wrong in two ways: with a cells v2 split MQ architecture, we don't
actually want the computes broadcasting to the scheduler, the computes
are sending updates to placement which the scheduler uses (and is more
scalable than the scheduler using RPC), the other thing is we aren't
working on splitting the scheduler out so it doesn't need its own
database (i.e. Gantt is dead).

- The "Updating the Scheduler about other data" section talks about
sending information to cinder and neutron, but that's not really what
will happen, we use placement as a centralized location to store
information from different services and then the scheduler uses that, a
concrete example being the minimum guaranteed bandwidth-based scheduling
work in Stein.

- The "Resource Tracker" section talks about the "recent" work for NUMA
which is not recent anymore. This section already references the initial
resource providers blueprint so it should probably just be re-written to
mention placement as the place to model complex resource inventory
relations like NUMA.

- The "Parallelism and Concurrency" section really should be talking
about using placement for atomic resource allocation claims during
scheduling and eventually removing the resource claim process in the
ResourceTracker/compute service code, but that still needs some things
modeled in placement like NUMA and PCI devices. A two-phase commit isn't
going to happen in nova as the doc mentions. And the mention about cells
v2 scaling is also out of date - CERN is running cells v2 with 14K+
nodes (although there are known issues with scheduler performance with
multiple cells that have caused CERN to have to severely limit their
placement allocation candidate results). So for the cells v2 stuff, we
should focus on identifying the known issues and efforts involved in
solving them (mostly again related to placement).

-----------------------------------
Release: 18.1.0.dev1612 on 2017-06-28 12:23:19
SHA: 5ca858eaa72acd0513e27a4c9518980b769f5d6e
Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/reference/scheduler-evolution.rst
URL: https://docs.openstack.org/nova/latest/reference/scheduler-evolution.html

** Affects: nova
     Importance: Medium
         Status: Confirmed


** Tags: doc scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1820283

Title:
  Scheduler Evolution in nova - the doc needs updating

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  There are several things in this doc that could be more detailed
  and/or updated since they were written.  Some notes:

  - The "Cross Project Affinity" section should probably talk about the
  [cinder]cross_az_attach config option (and limitations with it) and
  things like neutron routed networks. It should also mention that long-
  term affinity should be modeled using placement.

  - The "Accessing Aggregates in Filters and Weights" section could
  maybe be removed since the host aggregate information is cached in the
  scheduler, which is mentioned later in "Updating the Scheduler about
  other data".

  - The "Project Scale issues" section could probably be removed or re-
  written to mention that the scheduler should depend more and more on
  placement which is it's own project now.

  - The "Fixing the Scheduler DB model" section is also probably
  something that could be removed; I think the compute_node/services
  table stuff is mostly decoupled although there are links yet in the
  data model, and again this section is talking about filters hitting
  the DB for aggregates which are actually cached (or should be).

  - "Versioning Scheduler Placement Interfaces" could probably be
  removed since we are no longer passing unversioned dicts over RPC to
  the scheduler, we use RequestSpec and Selection objects now.

  - The "Sending host and node stats to the scheduler" section is a bit
  wrong in two ways: with a cells v2 split MQ architecture, we don't
  actually want the computes broadcasting to the scheduler, the computes
  are sending updates to placement which the scheduler uses (and is more
  scalable than the scheduler using RPC), the other thing is we aren't
  working on splitting the scheduler out so it doesn't need its own
  database (i.e. Gantt is dead).

  - The "Updating the Scheduler about other data" section talks about
  sending information to cinder and neutron, but that's not really what
  will happen, we use placement as a centralized location to store
  information from different services and then the scheduler uses that,
  a concrete example being the minimum guaranteed bandwidth-based
  scheduling work in Stein.

  - The "Resource Tracker" section talks about the "recent" work for
  NUMA which is not recent anymore. This section already references the
  initial resource providers blueprint so it should probably just be re-
  written to mention placement as the place to model complex resource
  inventory relations like NUMA.

  - The "Parallelism and Concurrency" section really should be talking
  about using placement for atomic resource allocation claims during
  scheduling and eventually removing the resource claim process in the
  ResourceTracker/compute service code, but that still needs some things
  modeled in placement like NUMA and PCI devices. A two-phase commit
  isn't going to happen in nova as the doc mentions. And the mention
  about cells v2 scaling is also out of date - CERN is running cells v2
  with 14K+ nodes (although there are known issues with scheduler
  performance with multiple cells that have caused CERN to have to
  severely limit their placement allocation candidate results). So for
  the cells v2 stuff, we should focus on identifying the known issues
  and efforts involved in solving them (mostly again related to
  placement).

  -----------------------------------
  Release: 18.1.0.dev1612 on 2017-06-28 12:23:19
  SHA: 5ca858eaa72acd0513e27a4c9518980b769f5d6e
  Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/reference/scheduler-evolution.rst
  URL: https://docs.openstack.org/nova/latest/reference/scheduler-evolution.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1820283/+subscriptions