yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1711184] [NEW] scheduler selects the same ironic node several times

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Pavlo Shchelokovskyy <1711184@xxxxxxxxxxxxxxxxxx>
Date: Wed, 16 Aug 2017 17:41:09 -0000
Reply-to: Bug 1711184 <1711184@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

Observed on ironic multinode grenade job (using Ocata scheduler).

Ironic returns its nodes in the same relative order (by internal DB id).
Quite often (and in DevStack always by default) ironic nodes are
identical, thus filter scheduler gives them the same weight. As a
result, during concurrent requests to schedule instances, the
weighed_hosts list goes always in the same order and is being always
consumed from the start.

This leads to the first node selected often enough to exceed the default
number of retries when it is being stolen by another concurrent request
(as it also always picks the first one from the list).

See log examples from the same gate job [0-2], failure [3]
(ServerActionsTestJSON test failure). Notice how the weighed hosts list
is always is always in the same order, and scheduler retries 3 times on
nodes that are being already occupied by another parallel request,
always picking the currently first one.

This could be fixed by increasing the host_subset_size config option from its default value of 1,
which would bring some randomness to the first element.
While fine (and actually recommended) for baremetal-only case, this choice is a bit suboptimal in a mixed hypervizor (virtual+ironic computes) as it makes scheduling logic for virtual computes less ideal.

Instead, it might be better to always randomize the first hosts in the
weighed_hosts list for hosts with identical (and maximal) weight as
those should be equally good candidates to schedule to. This will
decrease collision and rescheduling chances, definitely for ironic
nodes, but also to some tiny extent for standard compute hosts as well.


[0] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-ironic-multinode-multitenant-ubuntu-xenial/8d3f840/logs/old/screen-n-sch.txt.gz#_2017-08-15_13_27_29_410

[1] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-
ironic-multinode-multitenant-ubuntu-
xenial/8d3f840/logs/old/screen-n-sch.txt.gz#_2017-08-15_13_27_31_839

[2] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-
ironic-multinode-multitenant-ubuntu-
xenial/8d3f840/logs/old/screen-n-sch.txt.gz#_2017-08-15_13_27_34_244

[3] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-
ironic-multinode-multitenant-ubuntu-
xenial/8d3f840/logs/testr_results.html.gz

** Affects: nova
     Importance: Undecided
     Assignee: Pavlo Shchelokovskyy (pshchelo)
         Status: In Progress


** Tags: ironic scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1711184

Title:
  scheduler selects the same ironic node several times

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Observed on ironic multinode grenade job (using Ocata scheduler).

  Ironic returns its nodes in the same relative order (by internal DB
  id). Quite often (and in DevStack always by default) ironic nodes are
  identical, thus filter scheduler gives them the same weight. As a
  result, during concurrent requests to schedule instances, the
  weighed_hosts list goes always in the same order and is being always
  consumed from the start.

  This leads to the first node selected often enough to exceed the
  default number of retries when it is being stolen by another
  concurrent request (as it also always picks the first one from the
  list).

  See log examples from the same gate job [0-2], failure [3]
  (ServerActionsTestJSON test failure). Notice how the weighed hosts
  list is always is always in the same order, and scheduler retries 3
  times on nodes that are being already occupied by another parallel
  request, always picking the currently first one.

  This could be fixed by increasing the host_subset_size config option from its default value of 1,
  which would bring some randomness to the first element.
  While fine (and actually recommended) for baremetal-only case, this choice is a bit suboptimal in a mixed hypervizor (virtual+ironic computes) as it makes scheduling logic for virtual computes less ideal.

  Instead, it might be better to always randomize the first hosts in the
  weighed_hosts list for hosts with identical (and maximal) weight as
  those should be equally good candidates to schedule to. This will
  decrease collision and rescheduling chances, definitely for ironic
  nodes, but also to some tiny extent for standard compute hosts as
  well.

  
  [0] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-ironic-multinode-multitenant-ubuntu-xenial/8d3f840/logs/old/screen-n-sch.txt.gz#_2017-08-15_13_27_29_410

  [1] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-
  ironic-multinode-multitenant-ubuntu-
  xenial/8d3f840/logs/old/screen-n-sch.txt.gz#_2017-08-15_13_27_31_839

  [2] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-
  ironic-multinode-multitenant-ubuntu-
  xenial/8d3f840/logs/old/screen-n-sch.txt.gz#_2017-08-15_13_27_34_244

  [3] http://logs.openstack.org/12/493812/2/check/gate-grenade-dsvm-
  ironic-multinode-multitenant-ubuntu-
  xenial/8d3f840/logs/testr_results.html.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1711184/+subscriptions
Follow ups

[Bug 1711184] Re: scheduler selects the same ironic node several times
From: OpenStack Infra, 2017-11-22