yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #63897
[Bug 1421550] Re: Creating VM image fails under the race condition with detaching volume
Reviewed: https://review.openstack.org/383859
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4aa39c44a4b08ee4e05548d5c258e795089b2bdd
Submitter: Jenkins
Branch: master
commit 4aa39c44a4b08ee4e05548d5c258e795089b2bdd
Author: Matthew Booth <mbooth@xxxxxxxxxx>
Date: Fri Oct 7 19:14:38 2016 +0100
libvirt: Fix races with nfs volume mount/umount
A single nfs export typically contains multiple volumes. We were
handling this in the libvirt driver by:
1. On mount, we 'ensure' the mount is available, so we don't fail if
another instance already has it mounted.
2. On umount, we trap and ignore 'device is busy' so we don't fail if
another instance is already using it.
Unfortunately, while this works for serial mounts and unmounts, there
are multiple failure cases when volumes from the same export are
mounted and unmounted simultaneously. It causes an error if an
instance is stopped: as the qemu process is not actively using the
mountpoint it will not prevent an unmount for another volume on the
same mountpoint from succeeding. It will not be possible to restart
the instance, because its mountpoint will not be mounted.
To fix this, we create a singleton manager object, which tracks mounts
and umount requests per export, and calls the real mount/umount only
when required. It uses per-export locks to allow concurrency while
avoiding races. Because we now expect to know the state of the host at
all times, we no longer need to execute speculative mount/umount
commands.
As we track attachments (a mapping from volume to instance) rather
than volumes, we also gracefully support multi-attach.
This change implements this for nfs, but the solution is intended to
be extended to all LibvirtBaseFileSystemVolumeDrivers.
Closes-Bug: #1421550
Change-Id: I3155984d76df06371a6c45f633aa448168a96d64
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1421550
Title:
Creating VM image fails under the race condition with detaching volume
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Environment:
nova 2014.2.1
cinder 2014.2.1
Ubuntu 14.04 LTS
Cinder volumes whose backend is NFS are used.
There are two 'ACTIVE' VM instances on the same compute node.
Creating VM image(VM snapshot) fails under the race condition with detaching volume for the other VM instance.
In creating VM image, starting VM instance fails(remains 'SHUTOFF' state) and the VM image is deleted.
nova-compute's log is as follows:
---------------------------------------------------------------------------------------------------------------
2015-01-26 10:28:47,000.744 11535 ERROR nova.virt.libvirt.driver [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] Error launching a defined domain with XML: <domain type='kvm'>
(snipped...)
2015-01-26 10:28:47,000.767 11535 DEBUG nova.compute.manager [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Cleaning up image e8c3255c-b4e8-4324-addb-365c5d7b1868 decorated_function /usr/lib/python2.7/dist-packages/nova/compute/manager.py:373
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Traceback (most recent call last):
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 369, in decorated_function
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] *args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3027, in snapshot_instance
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] task_states.IMAGE_SNAPSHOT)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3058, in _snapshot_instance
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] update_task_state)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1733, in snapshot
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] new_dom = self._create_domain(domain=virt_dom)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4338, in _create_domain
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] LOG.error(err)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 82, in __exit__
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] six.reraise(self.type_, self.value, self.tb)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4329, in _create_domain
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] domain.createWithFlags(launch_flags)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] rv = execute(f, *args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] six.reraise(c, e, tb)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] rv = meth(*args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 896, in createWithFlags
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] libvirtError: Failed to open file '/var/lib/nova/mnt/339b9d35866664794a8155657a049127/volume-27f59813-76f0-4b56-ba42-75922537c36c': No such file or directory
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]
---------------------------------------------------------------------------------------------------------------
In detaching volume, umounting NFS is performed without checking whether the other VM instance is being attached volumes or not.
So if the other VM is stopped, umounting NFS succeeds.
If there are no processes using the NFS directory, the NFS directory is umounted.
nova/virt/libvirt/volumes.py(2014.2.1):
---------------------------------------------------------------------------------------------------------------
class LibvirtNFSVolumeDriver(LibvirtBaseVolumeDriver):
(snipped...)
def disconnect_volume(self, connection_info, disk_dev):
"""Disconnect the volume."""
export = connection_info['data']['export']
mount_path = os.path.join(CONF.libvirt.nfs_mount_point_base,
utils.get_hash_str(export))
try:
utils.execute('umount', mount_path, run_as_root=True)
except processutils.ProcessExecutionError as exc:
if ('device is busy' in exc.message or
'target is busy' in exc.message):
LOG.debug("The NFS share %s is still in use.", export)
else:
LOG.exception(_LE("Couldn't unmount the NFS share %s"), export)
---------------------------------------------------------------------------------------------------------------
* This code has been added in https://review.openstack.org/#/c/76558/.
A VM instance is stopped once by creating VM image.
And then detaching volume for the other VM instance on the same compute node is executed.
If there are no VMs connecting cinder volumes, umounting NFS directory succeeds.
After VM snapshot is completed, the VM instance is restarted.
But the VM instance cannot access volumes because NFS directory has been umounted.
So the error occurs and the VM instance cannot be restarted.
And this issue also occurs under the race condition with starting a VM instance
and detaching volumes for another VM instance('ACTIVE') on the same compute node.
1. _connect_volume in starting a VM instance(mount NFS directory if not mounted.)
2. _disconnect_volume in detaching volume(umount NFS directory if no processes use it.)
3. The libvirt domain starts in starting a VM instance
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1421550/+subscriptions
References