← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1148614] Re: Reboot with bad volume fails ungracefully

 

** Changed in: nova
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1148614

Title:
  Reboot with bad volume fails ungracefully

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  If a user has an instance that has a Cinder volume attached and, for
  whatever reason, that volume becomes inoperable, a subsequent reboot
  operation may cause the instance to go into a permanent halted state.

  This affects the `xenapi` driver for sure; it's unknown whether a
  similar issue exists in the other virt-drivers.

  Steps to replicate:

  1. Build an instance
  2. Attach a cinder-volume (using lvm+iscsi driver)
  3. Sever the iscsi connection: killall -s9 tgtd on the cinder volume server
  4. Reboot instance
  5. Verify that instance goes to halted and can't be started

  Proposed solution:

  The proposed solution as a few different steps:

  1. Detect that reboot failed due to bad-volumes being attached
  2. Detect exactly which volumes are bad
  3. Detach these volumes in the virt-layer so that the VM operation can be retried
  4. Raise an exception to notify the compute-manager layer that a driver operation had the *side-effect* of detaching a set of 'bad' volumes so that any compute level cleanups (destroy BDM, Cinder volume detach) can be made

  Note:

  The current method of detecting which volume is 'bad' indirectly makes
  use of a 120 sec timeout within the XenServer code. An upstream patch
  from Citrix to so that we can 'fail-fast' here would speed up error
  recover dramatically.

  For example, on a given network, we might want to say that a
  connection hung for more than 10 secs is in accessible rather than
  having to wait a full two minutes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1148614/+subscriptions