← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1784353] Re: Rescheduled boot from volume instances fail due to the premature removal of their attachments

 

Reviewed:  https://review.openstack.org/587071
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=41452a5c6adb8cae54eef24803f4adc468131b34
Submitter: Zuul
Branch:    master

commit 41452a5c6adb8cae54eef24803f4adc468131b34
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date:   Mon Jul 30 13:41:35 2018 +0100

    conductor: Recreate volume attachments during a reschedule
    
    When an instance with attached volumes fails to spawn, cleanup code
    within the compute manager (_shutdown_instance called from
    _build_resources) will delete the volume attachments referenced by
    the bdms in Cinder. As a result we should check and if necessary
    recreate these volume attachments when rescheduling an instance.
    
    Note that there are a few different ways to fix this bug by
    making changes to the compute manager code, either by not deleting
    the volume attachment on failure before rescheduling [1] or by
    performing the get/create check during each build after the
    reschedule [2].
    
    The problem with *not* cleaning up the attachments is if we don't
    reschedule, then we've left orphaned "reserved" volumes in Cinder
    (or we have to add special logic to tell compute when to cleanup
    attachments).
    
    The problem with checking the existence of the attachment on every
    new host we build on is that we'd be needlessly checking that for
    initial creates even if we don't ever need to reschedule, unless
    again we have special logic against that (like checking to see if
    we've rescheduled at all).
    
    Also, in either case that involves changes to the compute means that
    older computes might not have the fix.
    
    So ultimately it seems that the best way to handle this is:
    
    1. Only deal with this on reschedules.
    2. Let the cell conductor orchestrate it since it's already dealing
       with the reschedule. Then the compute logic doesn't need to change.
    
    [1] https://review.openstack.org/#/c/587071/3/nova/compute/manager.py@1631
    [2] https://review.openstack.org/#/c/587071/4/nova/compute/manager.py@1667
    
    Change-Id: I739c06bd02336bf720cddacb21f48e7857378487
    Closes-bug: #1784353


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784353

Title:
  Rescheduled boot from volume instances fail due to the premature
  removal of their attachments

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress

Bug description:
  Description
  ===========
  This is caused by the cleanup code within the compute layer (_shutdown_instance) removing all volume attachments associated with an instance with no attempt being made to recreate these ahead of the instance being rescheduled.

  Steps to reproduce
  ==================
  - Attempt to boot an instance with volumes attached.
  - Ensure spawn() fails, for example by stopping the l2 network agent services on the compute host.

  Expected result
  ===============
  The instance is reschedule to another compute host and boots correctly.

  Actual result
  =============
  The instance fails to boot on all hosts that is rescheduled to due to a missing volume attachment.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     bf497cc47497d3a5603bf60de652054ac5ae1993

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     Libvirt + KVM, however this shouldn't matter.

  3. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  4. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==============

      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] Traceback (most recent call last):  
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1579, in _prep_block_device
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     wait_func=self._await_block_device_map_created)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 837, in attach_block_devices
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     _log_and_attach(device)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 834, in _log_and_attach
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     bdm.attach(*attach_args, **attach_kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 46, in wrapped
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     ret_val = method(obj, context, *args, **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 617, in attach
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     virt_driver, do_driver_attach)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     return f(*args, **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 614, in _do_locked_attach
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     self._do_attach(*args, **_kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 599, in _do_attach
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     do_driver_attach)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/virt/block_device.py", line 513, in _volume_attach
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     self['mount_device'])['connection_info']
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 379, in wrapper
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     res = method(self, ctx, *args, **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 418, in wrapper
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     attachment_id=attachment_id))
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 450, in _reraise
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     six.reraise(type(desired_exc), desired_exc, sys.exc_info()[2])
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 415, in wrapper
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     res = method(self, ctx, attachment_id, *args, **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 824, in attachment_update
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     'code': getattr(ex, 'code', None)})
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     self.force_reraise()
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     six.reraise(self.type_, self.value, self.tb)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 814, in attachment_update
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     attachment_id, _connector)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/cinderclient/v3/attachments.py", line 67, in update
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     resp = self._update('/attachments/%s' % id, body)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/cinderclient/base.py", line 344, in _update
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     resp, body = self.api.client.put(url, body=body, **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 206, in put
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     return self._cs_request(url, 'PUT', **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 191, in _cs_request
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     return self.request(url, method, **kwargs)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]   File "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 177, in request
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]     raise exceptions.from_response(resp, body)
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1] VolumeAttachmentNotFound: Volume attachment 11 [details]d518a9-16d4-4ccb-9487-ec2b35834945 could not be found.
      2018-07-04 15:19:43.191 1 ERROR nova.compute.manager [instance: d48c9894-2ba2-4752-bae5-36c437933ff1]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784353/+subscriptions


References