yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76971
[Bug 1814245] Re: _disconnect_volume incorrectly called for multiattach volumes during post_live_migration
Reviewed: https://review.openstack.org/551302
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b626c0dc7b113365002e743e6de2aeb40121fc81
Submitter: Zuul
Branch: master
commit b626c0dc7b113365002e743e6de2aeb40121fc81
Author: Matthew Booth <mbooth@xxxxxxxxxx>
Date: Fri Mar 9 14:41:49 2018 +0000
Avoid redundant initialize_connection on source post live migration
During live migration we update bdm.connection_info for attached volumes
in pre_live_migration to reflect the new connection on the destination
node. This means that after migration completes the BDM no longer has a
reference to the original connection_info to do the detach on the source
host. To address this, change I3dfb75eb added a second call to
initialize_connection on the source host to re-fetch the source host
connection_info before calling disconnect.
Unfortunately the cinder driver interface does not strictly require that
multiple calls to initialize_connection will return consistent results.
Although they normally do in practice, there is at least one cinder
driver (delliscsi) which doesn't. This results in a failure to
disconnect on the source host post migration.
This change avoids the issue entirely by fetching the BDMs prior to
modification on the destination node. As well as working round this
specific issue, it also avoids a redundant cinder call in all cases.
Note that this massively simplifies post_live_migration in the libvirt
driver. The complexity removed was concerned with reconstructing the
original connection_info. This required considering the cinder v2 and v3
use cases, and reconstructing the multipath_id which was written to
connection_info by the libvirt fibrechannel volume connector on
connection. These things are not necessary when we just use the original
data unmodified.
Other drivers affected are Xenapi and HyperV. Xenapi doesn't touch
volumes in post_live_migration, so is unaffected. HyperV did not
previously account for differences in connection_info between source and
destination, so was likely previously broken. This change should fix it.
Closes-Bug: #1754716
Closes-Bug: #1814245
Change-Id: I0390c9ff51f49b063f736ca6ef868a4fa782ede5
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1814245
Title:
_disconnect_volume incorrectly called for multiattach volumes during
post_live_migration
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
Triaged
Status in OpenStack Compute (nova) rocky series:
Triaged
Bug description:
Description
===========
Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f introduced a simple check to
ensure disconnect_volume is only called when detaching a multi-attach
volume from the final instance using it on a given host.
That change however doesn't take LM into account and more specifically
the call to _disconect_volume during post_live_migration at the end of
the migration from the source. At this point the original instance has
already moved so the call to objects.InstanceList.get_uuids_by_host
will only return one local instance that is using the volume instead
of two, allowing disconnect_volume to be called.
Depending on the backend being used this call can succeed removing the
connection to the volume for the remaining instance or os-brick can
fail in situations where it needs to flush I/O etc from the in-use
connection.
Steps to reproduce
==================
* Launch two instances attached to the same multiattach volume on the same host.
* LM one of these instances to another host.
Expected result
===============
No calls to disconnect_volume are made and the remaining instance on
the host is still able to access the multi-attach volume.
Actual result
=============
A call to disconnect_volume is made and the remaining instance is
unable to access the volume *or* the LM fails due to os-brick failures
to disconnect the in-use volume on the host.
Environment
===========
1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
master
2. Which hypervisor did you use?
(For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
Libvirt + KVM
2. Which storage type did you use?
(For example: Ceph, LVM, GPFS, ...)
What's the version of that?
LVM/iSCSI with multipath enabled reproduces the os-brick failure.
3. Which networking type did you use?
(For example: nova-network, Neutron with OpenVSwitch, ...)
N/A
Logs & Configs
==============
# nova show testvm2
[..]
| fault | {"message": "Unexpected error while running command. |
| | Command: multipath -f 360014054a424982306a4a659007f73b2 |
| | Exit code: 1 |
| | Stdout: u'Jan 28 16:09:29 | 360014054a424982306a4a659007f73b2: map in use\ |
| | Jan 28 16:09:29 | failed to remove multipath map 360014054a424982306a4a", "code": 500, "details": " |
| | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 202, in decorated_function |
| | return function(self, context, *args, **kwargs) |
| | File \"/usr/lib/python2.7/site-packages/nova/compute/manager.py\", line 6299, in _post_live_migration |
| | migrate_data) |
| | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 7744, in post_live_migration |
| | self._disconnect_volume(context, connection_info, instance) |
| | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py\", line 1287, in _disconnect_volume |
| | vol_driver.disconnect_volume(connection_info, instance) |
| | File \"/usr/lib/python2.7/site-packages/nova/virt/libvirt/volume/iscsi.py\", line 74, in disconnect_volume |
| | self.connector.disconnect_volume(connection_info['data'], None) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/utils.py\", line 150, in trace_logging_wrapper |
| | result = f(*args, **kwargs) |
| | File \"/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py\", line 274, in inner |
| | return f(*args, **kwargs) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py\", line 848, in disconnect_volume |
| | ignore_errors=ignore_errors) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py\", line 885, in _cleanup_connection |
| | force, exc) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 219, in remove_connection |
| | self.flush_multipath_device(multipath_name) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 275, in flush_multipath_device |
| | root_helper=self._root_helper) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/executor.py\", line 52, in _execute |
| | result = self.__execute(*args, **kwargs) |
| | File \"/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py\", line 169, in execute |
| | return execute_root(*cmd, **kwargs) |
| | File \"/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py\", line 207, in _wrap |
| | return self.channel.remote_call(name, args, kwargs) |
| | File \"/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py\", line 202, in remote_call |
| | raise exc_type(*result[2]) |
| | ", "created": "2019-01-28T07:10:09Z"}
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1814245/+subscriptions
References