← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1814245] Re: _disconnect_volume incorrectly called for multiattach volumes during post_live_migration

 

Reviewed:  https://review.openstack.org/551302
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b626c0dc7b113365002e743e6de2aeb40121fc81
Submitter: Zuul
Branch:    master

commit b626c0dc7b113365002e743e6de2aeb40121fc81
Author: Matthew Booth <mbooth@xxxxxxxxxx>
Date:   Fri Mar 9 14:41:49 2018 +0000

    Avoid redundant initialize_connection on source post live migration
    
    During live migration we update bdm.connection_info for attached volumes
    in pre_live_migration to reflect the new connection on the destination
    node. This means that after migration completes the BDM no longer has a
    reference to the original connection_info to do the detach on the source
    host. To address this, change I3dfb75eb added a second call to
    initialize_connection on the source host to re-fetch the source host
    connection_info before calling disconnect.
    
    Unfortunately the cinder driver interface does not strictly require that
    multiple calls to initialize_connection will return consistent results.
    Although they normally do in practice, there is at least one cinder
    driver (delliscsi) which doesn't. This results in a failure to
    disconnect on the source host post migration.
    
    This change avoids the issue entirely by fetching the BDMs prior to
    modification on the destination node. As well as working round this
    specific issue, it also avoids a redundant cinder call in all cases.
    
    Note that this massively simplifies post_live_migration in the libvirt
    driver. The complexity removed was concerned with reconstructing the
    original connection_info. This required considering the cinder v2 and v3
    use cases, and reconstructing the multipath_id which was written to
    connection_info by the libvirt fibrechannel volume connector on
    connection. These things are not necessary when we just use the original
    data unmodified.
    
    Other drivers affected are Xenapi and HyperV. Xenapi doesn't touch
    volumes in post_live_migration, so is unaffected. HyperV did not
    previously account for differences in connection_info between source and
    destination, so was likely previously broken. This change should fix it.
    
    Closes-Bug: #1754716
    Closes-Bug: #1814245
    Change-Id: I0390c9ff51f49b063f736ca6ef868a4fa782ede5


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1814245

Title:
  _disconnect_volume incorrectly called for multiattach volumes  during
  post_live_migration

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  Description
  ===========

  Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f introduced a simple check to
  ensure disconnect_volume is only called when detaching a multi-attach
  volume from the final instance using it on a given host.

  That change however doesn't take LM into account and more specifically
  the call to _disconect_volume during post_live_migration at the end of
  the migration from the source. At this point the original instance has
  already moved so the call to objects.InstanceList.get_uuids_by_host
  will only return one local instance that is using the volume instead
  of two, allowing disconnect_volume to be called.

  Depending on the backend being used this call can succeed removing the
  connection to the volume for the remaining instance or os-brick can
  fail in situations where it needs to flush I/O etc from the in-use
  connection.

  
  Steps to reproduce
  ==================

  * Launch two instances attached to the same multiattach volume on the same host.
  * LM one of these instances to another host.

  Expected result
  ===============

  No calls to disconnect_volume are made and the remaining instance on
  the host is still able to access the multi-attach volume.

  Actual result
  =============

  A call to disconnect_volume is made and the remaining instance is
  unable to access the volume *or* the LM fails due to os-brick failures
  to disconnect the in-use volume on the host.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     master

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)

     Libvirt + KVM

  
  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     LVM/iSCSI with multipath enabled reproduces the os-brick failure.

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==============

  # nova show testvm2
  [..]
      | fault                                | {"message": "Unexpected error while running command.                                                                      |
      |                                      | Command: multipath -f 360014054a424982306a4a659007f73b2                                                                   |
      |                                      | Exit code: 1                                                                                                              |
      |                                      | Stdout: u'Jan 28 16:09:29 | 360014054a424982306a4a659007f73b2: map in use\                                                |
      |                                      | Jan 28 16:09:29 | failed to remove multipath map 360014054a424982306a4a", "code": 500, "details": "                       |
      |                                      |   File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 202, in decorated_function                      |
      |                                      |     return function(self, context, *args, **kwargs)                                                                       |
      |                                      |   File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 6299, in _post_live_migration                   |
      |                                      |     migrate_data)                                                                                                         |
      |                                      |   File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 7744, in post_live_migration                |
      |                                      |     self._disconnect_volume(context, connection_info, instance)                                                           |
      |                                      |   File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 1287, in _disconnect_volume                 |
      |                                      |     vol_driver.disconnect_volume(connection_info, instance)                                                               |
      |                                      |   File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py\", line 74, in disconnect_volume              |
      |                                      |     self.connector.disconnect_volume(connection_info['data'], None)                                                       |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/utils.py\", line 150, in trace_logging_wrapper                         |
      |                                      |     result = f(*args, **kwargs)                                                                                           |
      |                                      |   File \"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py\", line 274, in inner                             |
      |                                      |     return f(*args, **kwargs)                                                                                             |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py\", line 848, in disconnect_volume        |
      |                                      |     ignore_errors=ignore_errors)                                                                                          |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py\", line 885, in _cleanup_connection      |
      |                                      |     force, exc)                                                                                                           |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 219, in remove_connection               |
      |                                      |     self.flush_multipath_device(multipath_name)                                                                           |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 275, in flush_multipath_device          |
      |                                      |     root_helper=self._root_helper)                                                                                        |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/executor.py\", line 52, in _execute                                    |
      |                                      |     result = self.__execute(*args, **kwargs)                                                                              |
      |                                      |   File \"/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py\", line 169, in execute                         |
      |                                      |     return execute_root(*cmd, **kwargs)                                                                                   |
      |                                      |   File \"/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py\", line 207, in _wrap                              |
      |                                      |     return self.channel.remote_call(name, args, kwargs)                                                                   |
      |                                      |   File \"/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py\", line 202, in remote_call                              |
      |                                      |     raise exc_type(*result[2])                                                                                            |
      |                                      | ", "created": "2019-01-28T07:10:09Z"}

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1814245/+subscriptions


References