← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1714248] Re: Compute node HA for ironic doesn't work due to the name duplication of Resource Provider

 

This started breaking ironic multinode CI, see investigation in
https://bugs.launchpad.net/ironic/+bug/1737395

** Also affects: ironic
   Importance: Undecided
       Status: New

** Changed in: ironic
       Status: New => Triaged

** Changed in: ironic
   Importance: Undecided => Critical

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1714248

Title:
  Compute node HA for ironic doesn't work due to the name duplication of
  Resource Provider

Status in Ironic:
  Triaged
Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Description
  ===========
  In an environment where there are multiple compute nodes with ironic driver,
  when a compute node goes down, another compute node cannot take over ironic nodes.

  Steps to reproduce
  ==================
  1. Start multiple compute nodes with ironic driver.
  2. Register one node to ironic.
  2. Stop a compute node which manages the ironic node.
  3. Create an instance.

  Expected result
  ===============
  The instance is created.

  Actual result
  =============
  The instance creation is failed.

  Environment
  ===========
  1. Exact version of OpenStack you are running.
  openstack-nova-scheduler-15.0.6-2.el7.noarch
  openstack-nova-console-15.0.6-2.el7.noarch
  python2-novaclient-7.1.0-1.el7.noarch
  openstack-nova-common-15.0.6-2.el7.noarch
  openstack-nova-serialproxy-15.0.6-2.el7.noarch
  openstack-nova-placement-api-15.0.6-2.el7.noarch
  python-nova-15.0.6-2.el7.noarch
  openstack-nova-novncproxy-15.0.6-2.el7.noarch
  openstack-nova-api-15.0.6-2.el7.noarch
  openstack-nova-conductor-15.0.6-2.el7.noarch

  2. Which hypervisor did you use?
  ironic

  Details
  =======
  When a nova-compute goes down, another nova-compute will take over ironic nodes managed by the failed nova-compute by re-balancing a hash-ring. Then the active nova-compute tries creating a
  new resource provider with a new ComputeNode object UUID and the hypervisor name (ironic node UUID)[1][2][3]. This creation fails with a conflict(409) since there is a resource provider with the same name created by the failed nova-compute.

  When a new instance is requested, the scheduler gets only an old
  resource provider for the ironic node[4]. Then, the ironic node is not
  selected:

  WARNING nova.scheduler.filters.compute_filter [req-
  a37d68b5-7ab1-4254-8698-502304607a90 7b55e61a07304f9cab1544260dcd3e41
  e21242f450d948d7af2650ac9365ee36 - - -] (compute02, 8904aeeb-a35b-4ba3
  -848a-73269fdde4d3) ram: 4096MB disk: 849920MB io_ops: 0 instances: 0
  has not been heard from in a while

  [1] https://github.com/openstack/nova/blob/stable/ocata/nova/compute/resource_tracker.py#L464
  [2] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L630
  [3] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L410
  [4] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filter_scheduler.py#L183

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1714248/+subscriptions


References