← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1670319] [NEW] Reschedule of failed instance doesn't happening when scheduler placed two instances to the same ironic node

 

Public bug reported:

There is a known bug https://bugs.launchpad.net/tripleo/+bug/1341420 that is caused by nova scheduling/claim resources design. In two words scheduler may schedule different instances to the same ironic node and second instance will alway fail as claim of resources is done on nova-compute side.
There should be a reschedule for second instance once it is failed, but it doesn't happening.

1. Fisrt instance is placed on c4d5e326-7ad3-4c25-bfe5-3cab211a723e
http://logs.openstack.org/71/441271/1/gate/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial/b0037ad/logs/screen-n-sch.txt.gz#_2017-03-06_09_08_08_343

2017-03-06 09:08:08.343 20337 DEBUG nova.scheduler.filter_scheduler
[req-d7c167ea-4bd9-40fe-bfa6-452695a40fa9 tempest-
ServersTestJSON-207710543 tempest-ServersTestJSON-207710543] Selected
host: WeighedHost [host: (ubuntu-xenial-2-node-osic-
cloud1-s3500-7711232-456798, c4d5e326-7ad3-4c25-bfe5-3cab211a723e) ram:
384MB disk: 10240MB io_ops: 0 instances: 0, weight: 2.0] _schedule
/opt/stack/new/nova/nova/scheduler/filter_scheduler.py:126

2. Second instance is placed on c4d5e326-7ad3-4c25-bfe5-3cab211a723e
http://logs.openstack.org/71/441271/1/gate/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial/b0037ad/logs/screen-n-sch.txt.gz#_2017-03-06_09_08_08_421
2017-03-06 09:08:08.421 20337 DEBUG nova.scheduler.filter_scheduler [req-f903ab7f-7525-4567-82f7-8bf2f2b53c86 tempest-ServerActionsTestJSON-1730451988 tempest-ServerActionsTestJSON-1730451988] Selected host: WeighedHost [host: (ubuntu-xenial-2-node-osic-cloud1-s3500-7711232-456798, c4d5e326-7ad3-4c25-bfe5-3cab211a723e) ram: 384MB disk: 10240MB io_ops: 0 instances: 0, weight: 2.0] _schedule 

3. nova-compute doesn't reschedule failed instance

http://logs.openstack.org/71/441271/1/gate/gate-tempest-dsvm-ironic-ipa-
wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-
xenial/b0037ad/logs/subnode-2/screen-n-cpu.txt.gz#_2017-03-06_09_08_09_137

2017-03-06 09:08:09.137 31801 DEBUG nova.compute.manager [req-
f903ab7f-7525-4567-82f7-8bf2f2b53c86 tempest-
ServerActionsTestJSON-1730451988 tempest-
ServerActionsTestJSON-1730451988] [instance:
bef43a32-f310-4ef4-8264-c7bc064856b1] Retry info not present, will not
reschedule _do_build_and_run_instance
/opt/stack/new/nova/nova/compute/manager.py:1788

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1670319

Title:
  Reschedule of failed instance doesn't happening when scheduler placed
  two instances to the same ironic node

Status in OpenStack Compute (nova):
  New

Bug description:
  There is a known bug https://bugs.launchpad.net/tripleo/+bug/1341420 that is caused by nova scheduling/claim resources design. In two words scheduler may schedule different instances to the same ironic node and second instance will alway fail as claim of resources is done on nova-compute side.
  There should be a reschedule for second instance once it is failed, but it doesn't happening.

  1. Fisrt instance is placed on c4d5e326-7ad3-4c25-bfe5-3cab211a723e
  http://logs.openstack.org/71/441271/1/gate/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial/b0037ad/logs/screen-n-sch.txt.gz#_2017-03-06_09_08_08_343

  2017-03-06 09:08:08.343 20337 DEBUG nova.scheduler.filter_scheduler
  [req-d7c167ea-4bd9-40fe-bfa6-452695a40fa9 tempest-
  ServersTestJSON-207710543 tempest-ServersTestJSON-207710543] Selected
  host: WeighedHost [host: (ubuntu-xenial-2-node-osic-
  cloud1-s3500-7711232-456798, c4d5e326-7ad3-4c25-bfe5-3cab211a723e)
  ram: 384MB disk: 10240MB io_ops: 0 instances: 0, weight: 2.0]
  _schedule /opt/stack/new/nova/nova/scheduler/filter_scheduler.py:126

  2. Second instance is placed on c4d5e326-7ad3-4c25-bfe5-3cab211a723e
  http://logs.openstack.org/71/441271/1/gate/gate-tempest-dsvm-ironic-ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-xenial/b0037ad/logs/screen-n-sch.txt.gz#_2017-03-06_09_08_08_421
  2017-03-06 09:08:08.421 20337 DEBUG nova.scheduler.filter_scheduler [req-f903ab7f-7525-4567-82f7-8bf2f2b53c86 tempest-ServerActionsTestJSON-1730451988 tempest-ServerActionsTestJSON-1730451988] Selected host: WeighedHost [host: (ubuntu-xenial-2-node-osic-cloud1-s3500-7711232-456798, c4d5e326-7ad3-4c25-bfe5-3cab211a723e) ram: 384MB disk: 10240MB io_ops: 0 instances: 0, weight: 2.0] _schedule 

  3. nova-compute doesn't reschedule failed instance

  http://logs.openstack.org/71/441271/1/gate/gate-tempest-dsvm-ironic-
  ipa-wholedisk-agent_ipmitool-tinyipa-multinode-ubuntu-
  xenial/b0037ad/logs/subnode-2/screen-n-cpu.txt.gz#_2017-03-06_09_08_09_137

  2017-03-06 09:08:09.137 31801 DEBUG nova.compute.manager [req-
  f903ab7f-7525-4567-82f7-8bf2f2b53c86 tempest-
  ServerActionsTestJSON-1730451988 tempest-
  ServerActionsTestJSON-1730451988] [instance:
  bef43a32-f310-4ef4-8264-c7bc064856b1] Retry info not present, will not
  reschedule _do_build_and_run_instance
  /opt/stack/new/nova/nova/compute/manager.py:1788

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1670319/+subscriptions