← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1477490] [NEW] Ironic: Deleting while spawning can leave orphan ACTIVE nodes in Ironic

 

Public bug reported:

The Ironic nova driver won't try to delete the instance in Ironic if the
node's provision state is DEPLOYING [1] , this is known to fail with the
current Ironic code because we just can't abort the installation at the
DEPLOYING stage.

But the Ironic nova driver just keep going and tries to clean up the
deployment environment (without telling Ironic to unprovision the
instance) and it will fail as well. But the the code that cleans up the
instance will keep retrying [3] because there's a transition in progress
and it can't update the node. But when the node finishes the deployment,
if the retrying didn't timed out, the destroy() method from the Nova
driver  will succeed cleaning deployment environment and the Nova
instance will be deleted but the Ironic node will continue to marked as
ACTIVE in Ironic and now orphan because there's no instance in Nova
associated with it [4]

The good news is that since nova clean up the network stuff the instance
won't be accessible.

WORKAROUND:

Unprovision the node using the Ironic API directly

$ ironic node-set-provision-state <node uuid> deleted

PROPOSED FIX:

IMO the ironic nova driver should try to tell Ironic to delete the
instance even when the provision state of the node is DEPLOYING. If it
fails the nova delete command will fail saying it can not delete the
instance, which is fine until this gets resolved in Ironic (there's work
going on to be able to abort a deployment at any stage)


[1] https://github.com/openstack/nova/blob/6a24bbeecd8a6d6d3135a10f4917b071896d14ee/nova/virt/ironic/driver.py#L865-L868

[2]
https://github.com/openstack/nova/blob/6a24bbeecd8a6d6d3135a10f4917b071896d14ee/nova/virt/ironic/driver.py#L871

[3] From the nova-compute logs

{"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"Node d240ae0d-1844-48f0-adcf-b70680a1b6ce can not be updated while a state transition is in progress.\"}"}
 from (pid=6672) log_http_response /usr/local/lib/python2.7/dist-packages/ironicclient/common/http.py:260
2015-07-23 11:07:40.358 WARNING ironicclient.common.http [req-24b39fe8-435d-4869-970f-53f64b3512a8 demo demo] Request returned failure status.
2015-07-23 11:07:40.358 WARNING ironicclient.common.http [req-24b39fe8-435d-4869-970f-53f64b3512a8 demo demo] Error contacting Ironic server: Node d240ae0d-1844-48f0-adcf-b70680a1b6ce can not be updated while a state transition is in progress. (HTTP 409). Attempt 3 of 6

[4] http://paste.openstack.org/show/403569/

** Affects: nova
     Importance: Undecided
     Assignee: Lucas Alvares Gomes (lucasagomes)
         Status: New

** Changed in: nova
     Assignee: (unassigned) => Lucas Alvares Gomes (lucasagomes)

** Summary changed:

- Ironic: Deleting while spawnming can leave orphan ACTIVE nodes in Ironic
+ Ironic: Deleting while spawning can leave orphan ACTIVE nodes in Ironic

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1477490

Title:
  Ironic: Deleting while spawning can leave orphan ACTIVE nodes in
  Ironic

Status in OpenStack Compute (nova):
  New

Bug description:
  The Ironic nova driver won't try to delete the instance in Ironic if
  the node's provision state is DEPLOYING [1] , this is known to fail
  with the current Ironic code because we just can't abort the
  installation at the DEPLOYING stage.

  But the Ironic nova driver just keep going and tries to clean up the
  deployment environment (without telling Ironic to unprovision the
  instance) and it will fail as well. But the the code that cleans up
  the instance will keep retrying [3] because there's a transition in
  progress and it can't update the node. But when the node finishes the
  deployment, if the retrying didn't timed out, the destroy() method
  from the Nova driver  will succeed cleaning deployment environment and
  the Nova instance will be deleted but the Ironic node will continue to
  marked as ACTIVE in Ironic and now orphan because there's no instance
  in Nova associated with it [4]

  The good news is that since nova clean up the network stuff the
  instance won't be accessible.

  WORKAROUND:

  Unprovision the node using the Ironic API directly

  $ ironic node-set-provision-state <node uuid> deleted

  PROPOSED FIX:

  IMO the ironic nova driver should try to tell Ironic to delete the
  instance even when the provision state of the node is DEPLOYING. If it
  fails the nova delete command will fail saying it can not delete the
  instance, which is fine until this gets resolved in Ironic (there's
  work going on to be able to abort a deployment at any stage)

  
  [1] https://github.com/openstack/nova/blob/6a24bbeecd8a6d6d3135a10f4917b071896d14ee/nova/virt/ironic/driver.py#L865-L868

  [2]
  https://github.com/openstack/nova/blob/6a24bbeecd8a6d6d3135a10f4917b071896d14ee/nova/virt/ironic/driver.py#L871

  [3] From the nova-compute logs

  {"error_message": "{\"debuginfo\": null, \"faultcode\": \"Client\", \"faultstring\": \"Node d240ae0d-1844-48f0-adcf-b70680a1b6ce can not be updated while a state transition is in progress.\"}"}
   from (pid=6672) log_http_response /usr/local/lib/python2.7/dist-packages/ironicclient/common/http.py:260
  2015-07-23 11:07:40.358 WARNING ironicclient.common.http [req-24b39fe8-435d-4869-970f-53f64b3512a8 demo demo] Request returned failure status.
  2015-07-23 11:07:40.358 WARNING ironicclient.common.http [req-24b39fe8-435d-4869-970f-53f64b3512a8 demo demo] Error contacting Ironic server: Node d240ae0d-1844-48f0-adcf-b70680a1b6ce can not be updated while a state transition is in progress. (HTTP 409). Attempt 3 of 6

  [4] http://paste.openstack.org/show/403569/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1477490/+subscriptions


Follow ups