← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1488111] Re: Boot from volumes that fail in initialize_connection are not rescheduled

 

I wouldn't say that we won't ever fix this, since I've wondered why we
don't reschedule on volume failures like we do with networking failures,
but it's not a high priority.

** Tags removed: liberty-backport-potential

** No longer affects: nova/liberty

** Changed in: nova
       Status: Won't Fix => Opinion

** Changed in: nova
       Status: Opinion => Confirmed

** Changed in: nova
   Importance: Low => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1488111

Title:
  Boot from volumes that fail in initialize_connection are not
  rescheduled

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Version: OpenStack Liberty

  Boot from volumes that fail in volume initialize_connection are not
  rescheduled.  Initialize connection failures can be very host-specific
  and in many cases the boot would succeed if the instance build was
  rescheduled to another host.

  The instance is not rescheduled because the initialize_connection is being called down this stack:
  nova.compute.manager _build_resources
  nova.compute.manager _prep_block_device
  nova.virt.block_device attach_block_devices
  nova.virt.block_device.DriverVolumeBlockDevice.attach

  When this fails an exception is thrown which lands in this block:
  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1740
  and throws an InvalidBDM exception which is caught by this block:
  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2110

  this in turn throws a BuildAbortException which causes the instance to not be rescheduled by landing the flow in this block:
  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2004

  To fix this we likely need a different exception thrown from
  nova.virt.block_device.DriverVolumeBlockDevice.attach when the failure
  is in initialize_connection and then work back up the stack to ensure
  that when this different exception is thrown a BuildAbortException  is
  not thrown so the reschedule can happen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1488111/+subscriptions


References