yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #68529
[Bug 1488111] Re: Boot from volumes that fail in initialize_connection are not rescheduled
I wouldn't say that we won't ever fix this, since I've wondered why we
don't reschedule on volume failures like we do with networking failures,
but it's not a high priority.
** Tags removed: liberty-backport-potential
** No longer affects: nova/liberty
** Changed in: nova
Status: Won't Fix => Opinion
** Changed in: nova
Status: Opinion => Confirmed
** Changed in: nova
Importance: Low => Wishlist
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1488111
Title:
Boot from volumes that fail in initialize_connection are not
rescheduled
Status in OpenStack Compute (nova):
Confirmed
Bug description:
Version: OpenStack Liberty
Boot from volumes that fail in volume initialize_connection are not
rescheduled. Initialize connection failures can be very host-specific
and in many cases the boot would succeed if the instance build was
rescheduled to another host.
The instance is not rescheduled because the initialize_connection is being called down this stack:
nova.compute.manager _build_resources
nova.compute.manager _prep_block_device
nova.virt.block_device attach_block_devices
nova.virt.block_device.DriverVolumeBlockDevice.attach
When this fails an exception is thrown which lands in this block:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1740
and throws an InvalidBDM exception which is caught by this block:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2110
this in turn throws a BuildAbortException which causes the instance to not be rescheduled by landing the flow in this block:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2004
To fix this we likely need a different exception thrown from
nova.virt.block_device.DriverVolumeBlockDevice.attach when the failure
is in initialize_connection and then work back up the stack to ensure
that when this different exception is thrown a BuildAbortException is
not thrown so the reschedule can happen.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1488111/+subscriptions
References