yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75378
[Bug 1784353] Re: Rescheduled boot from volume instances fail due to the premature removal of their attachments
Reviewed: https://review.openstack.org/587071
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=41452a5c6adb8cae54eef24803f4adc468131b34
Submitter: Zuul
Branch: master
commit 41452a5c6adb8cae54eef24803f4adc468131b34
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date: Mon Jul 30 13:41:35 2018 +0100
conductor: Recreate volume attachments during a reschedule
When an instance with attached volumes fails to spawn, cleanup code
within the compute manager (_shutdown_instance called from
_build_resources) will delete the volume attachments referenced by
the bdms in Cinder. As a result we should check and if necessary
recreate these volume attachments when rescheduling an instance.
Note that there are a few different ways to fix this bug by
making changes to the compute manager code, either by not deleting
the volume attachment on failure before rescheduling [1] or by
performing the get/create check during each build after the
reschedule [2].
The problem with *not* cleaning up the attachments is if we don't
reschedule, then we've left orphaned "reserved" volumes in Cinder
(or we have to add special logic to tell compute when to cleanup
attachments).
The problem with checking the existence of the attachment on every
new host we build on is that we'd be needlessly checking that for
initial creates even if we don't ever need to reschedule, unless
again we have special logic against that (like checking to see if
we've rescheduled at all).
Also, in either case that involves changes to the compute means that
older computes might not have the fix.
So ultimately it seems that the best way to handle this is:
1. Only deal with this on reschedules.
2. Let the cell conductor orchestrate it since it's already dealing
with the reschedule. Then the compute logic doesn't need to change.
[1] https://review.openstack.org/#/c/587071/3/nova/compute/manager.py@1631
[2] https://review.openstack.org/#/c/587071/4/nova/compute/manager.py@1667
Change-Id: I739c06bd02336bf720cddacb21f48e7857378487
Closes-bug: #1784353
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784353
Title:
Rescheduled boot from volume instances fail due to the premature
removal of their attachments
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
In Progress
Status in OpenStack Compute (nova) rocky series:
In Progress
Bug description:
Description
===========
This is caused by the cleanup code within the compute layer (_shutdown_instance) removing all volume attachments associated with an instance with no attempt being made to recreate these ahead of the instance being rescheduled.
Steps to reproduce
==================
- Attempt to boot an instance with volumes attached.
- Ensure spawn() fails, for example by stopping the l2 network agent services on the compute host.
Expected result
===============
The instance is reschedule to another compute host and boots correctly.
Actual result
=============
The instance fails to boot on all hosts that is rescheduled to due to a missing volume attachment.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
bf497cc47497d3a5603bf60de652054ac5ae1993
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
What's the version of that?
Libvirt + KVM, however this shouldn't matter.
3. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
N/A
4. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] Traceback (most recent call last):
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1579, in _prep_block_device
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] wait_func=self._await_block_device_map_created)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 837, in attach_block_devices
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] _log_and_attach(device)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 834, in _log_and_attach
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] bdm.attach(*attach_args, **attach_kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 46, in wrapped
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] ret_val = method(obj, context, *args, **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 617, in attach
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] virt_driver, do_driver_attach)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] return f(*args, **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 614, in _do_locked_attach
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] self._do_attach(*args, **_kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 599, in _do_attach
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] do_driver_attach)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 513, in _volume_attach
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] self['mount_device'])['connection_info']
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 379, in wrapper
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] res = method(self, ctx, *args, **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 418, in wrapper
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] attachment_id=attachment_id))
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 450, in _reraise
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] six.reraise(type(desired_exc), desired_exc, sys.exc_info()[2])
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 415, in wrapper
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] res = method(self, ctx, attachment_id, *args, **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 824, in attachment_update
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] 'code': getattr(ex, 'code', None)})
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] self.force_reraise()
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] six.reraise(self.type_, self.value, self.tb)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 814, in attachment_update
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] attachment_id, _connector)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/v3/attachments.py", line 67, in update
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] resp = self._update('/attachments/%s' % id, body)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/base.py", line 344, in _update
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] resp, body = self.api.client.put(url, body=body, **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 206, in put
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] return self._cs_request(url, 'PUT', **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 191, in _cs_request
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] return self.request(url, method, **kwargs)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 177, in request
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] raise exceptions.from_response(resp, body)
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] VolumeAttachmentNotFound: Volume attachment 11 [details]d518a9-16d4-4ccb-9487-ec2b35834945 could not be found.
2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784353/+subscriptions
References