← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1750450] Re: ironic: n-cpu fails to recover after losing connection to ironic-api and placement-api

 

Reviewed:  https://review.openstack.org/545479
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=acab8b0067b9ac90ed8c27daf04cfb4f926aa41a
Submitter: Zuul
Branch:    master

commit acab8b0067b9ac90ed8c27daf04cfb4f926aa41a
Author: Jim Rollenhagen <jim@xxxxxxxxxxxxxxxxxx>
Date:   Fri Mar 16 16:33:20 2018 +0000

    ironic: stop lying to the RT when ironic is down
    
    Returning an empty list of nodes can cause all sorts of crazy behavior,
    so we instead bubble up a VirtDriverNotReady exception, which the compute
    manager will ignore.
    
    Change-Id: Ib0ec1012b74e9a9e74c8879f3feed5f9332b711f
    Related-Bug: #1744139
    Closes-Bug: #1750450


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750450

Title:
  ironic: n-cpu fails to recover after losing connection to ironic-api
  and placement-api

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The ironic virt driver does some crazy things when the ironic API goes
  down - it returns [] from get_available_nodes(). When the resource
  tracker sees this, it immediately attempts to delete all of the
  compute node records and resource providers for said nodes.

  If placement is also down at this time, the resource providers will
  not be properly deleted.

  When ironic-api and placement-api return, nova will see nodes, create
  compute_node records for them, and try to create new resource
  providers (as they are new compute_node records). This will fail with
  a name conflict, and the nodes will be unusable.

  This is easy to fix, by raising an exception in get_available_nodes,
  instead of lying to the resource tracker and returning []. However,
  this causes nova-compute to fail to start if ironic-api is not
  available.

  This may be fine but should have a larger discussion. We've added
  these hacks over the years for some reason, we should look at the
  bigger picture and decide how we want to handle these cases.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750450/+subscriptions


References