← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1421550] [NEW] Creating VM image fails under the race condition with detaching volume

 

Public bug reported:

Environment:
nova 2014.2.1
cinder 2014.2.1
Ubuntu 14.04 LTS
Cinder volumes whose backend is NFS are used.

There are two 'ACTIVE' VM instances on the same compute node.
Creating VM image(VM snapshot) fails under the race condition with detaching volume for the other VM instance.
In creating VM image, starting VM instance fails(remains 'SHUTOFF' state) and the VM image is deleted.

nova-compute's log is as follows:
---------------------------------------------------------------------------------------------------------------
2015-01-26 10:28:47,000.744 11535 ERROR nova.virt.libvirt.driver [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] Error launching a defined domain with XML: <domain type='kvm'>
(snipped...)
2015-01-26 10:28:47,000.767 11535 DEBUG nova.compute.manager [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Cleaning up image e8c3255c-b4e8-4324-addb-365c5d7b1868 decorated_function /usr/lib/python2.7/dist-packages/nova/compute/manager.py:373
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Traceback (most recent call last):
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 369, in decorated_function
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     *args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3027, in snapshot_instance
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     task_states.IMAGE_SNAPSHOT)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3058, in _snapshot_instance
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     update_task_state)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1733, in snapshot
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     new_dom = self._create_domain(domain=virt_dom)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4338, in _create_domain
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     LOG.error(err)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 82, in __exit__
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     six.reraise(self.type_, self.value, self.tb)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4329, in _create_domain
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     domain.createWithFlags(launch_flags)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     rv = execute(f, *args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     six.reraise(c, e, tb)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     rv = meth(*args, **kwargs)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 896, in createWithFlags
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] libvirtError: Failed to open file '/var/lib/nova/mnt/339b9d35866664794a8155657a049127/volume-27f59813-76f0-4b56-ba42-75922537c36c': No such file or directory
2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] 
---------------------------------------------------------------------------------------------------------------

In detaching volume, umounting NFS is performed without checking whether the other VM instance is being attached volumes or not.
So if the other VM is stopped, umounting NFS succeeds.

If there are no processes using the NFS directory, the NFS directory is umounted.
nova/virt/libvirt/volumes.py(2014.2.1):
---------------------------------------------------------------------------------------------------------------
class LibvirtNFSVolumeDriver(LibvirtBaseVolumeDriver):
(snipped...)
    def disconnect_volume(self, connection_info, disk_dev):
        """Disconnect the volume."""

        export = connection_info['data']['export']
        mount_path = os.path.join(CONF.libvirt.nfs_mount_point_base,
                                  utils.get_hash_str(export))

        try:
            utils.execute('umount', mount_path, run_as_root=True)
        except processutils.ProcessExecutionError as exc:
            if ('device is busy' in exc.message or
                'target is busy' in exc.message):
                LOG.debug("The NFS share %s is still in use.", export)
            else:
                LOG.exception(_LE("Couldn't unmount the NFS share %s"), export)
---------------------------------------------------------------------------------------------------------------
* This code has been added in https://review.openstack.org/#/c/76558/.

A VM instance is stopped once by creating VM image.
And then detaching volume for the other VM instance on the same compute node is executed.
If there are no VMs connecting cinder volumes, umounting NFS directory succeeds.
After VM snapshot is completed, the VM instance is restarted.
But the VM instance cannot access volumes because NFS directory has been umounted.
So the error occurs and the VM instance cannot be restarted.

And this issue also occurs under the race condition with starting a VM instance
and detaching volumes for another VM instance('ACTIVE') on the same compute node.

1. _connect_volume in starting a VM instance(mount NFS directory if not mounted.)
2. _disconnect_volume in detaching volume(umount NFS directory if no processes use it.)
3. The libvirt domain starts in starting a VM instance

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1421550

Title:
  Creating VM image fails under the race condition with detaching volume

Status in OpenStack Compute (Nova):
  New

Bug description:
  Environment:
  nova 2014.2.1
  cinder 2014.2.1
  Ubuntu 14.04 LTS
  Cinder volumes whose backend is NFS are used.

  There are two 'ACTIVE' VM instances on the same compute node.
  Creating VM image(VM snapshot) fails under the race condition with detaching volume for the other VM instance.
  In creating VM image, starting VM instance fails(remains 'SHUTOFF' state) and the VM image is deleted.

  nova-compute's log is as follows:
  ---------------------------------------------------------------------------------------------------------------
  2015-01-26 10:28:47,000.744 11535 ERROR nova.virt.libvirt.driver [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] Error launching a defined domain with XML: <domain type='kvm'>
  (snipped...)
  2015-01-26 10:28:47,000.767 11535 DEBUG nova.compute.manager [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Cleaning up image e8c3255c-b4e8-4324-addb-365c5d7b1868 decorated_function /usr/lib/python2.7/dist-packages/nova/compute/manager.py:373
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Traceback (most recent call last):
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 369, in decorated_function
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     *args, **kwargs)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3027, in snapshot_instance
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     task_states.IMAGE_SNAPSHOT)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3058, in _snapshot_instance
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     update_task_state)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1733, in snapshot
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     new_dom = self._create_domain(domain=virt_dom)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4338, in _create_domain
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     LOG.error(err)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 82, in __exit__
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     six.reraise(self.type_, self.value, self.tb)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4329, in _create_domain
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     domain.createWithFlags(launch_flags)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     rv = execute(f, *args, **kwargs)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     six.reraise(c, e, tb)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     rv = meth(*args, **kwargs)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 896, in createWithFlags
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] libvirtError: Failed to open file '/var/lib/nova/mnt/339b9d35866664794a8155657a049127/volume-27f59813-76f0-4b56-ba42-75922537c36c': No such file or directory
  2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] 
  ---------------------------------------------------------------------------------------------------------------

  In detaching volume, umounting NFS is performed without checking whether the other VM instance is being attached volumes or not.
  So if the other VM is stopped, umounting NFS succeeds.

  If there are no processes using the NFS directory, the NFS directory is umounted.
  nova/virt/libvirt/volumes.py(2014.2.1):
  ---------------------------------------------------------------------------------------------------------------
  class LibvirtNFSVolumeDriver(LibvirtBaseVolumeDriver):
  (snipped...)
      def disconnect_volume(self, connection_info, disk_dev):
          """Disconnect the volume."""

          export = connection_info['data']['export']
          mount_path = os.path.join(CONF.libvirt.nfs_mount_point_base,
                                    utils.get_hash_str(export))

          try:
              utils.execute('umount', mount_path, run_as_root=True)
          except processutils.ProcessExecutionError as exc:
              if ('device is busy' in exc.message or
                  'target is busy' in exc.message):
                  LOG.debug("The NFS share %s is still in use.", export)
              else:
                  LOG.exception(_LE("Couldn't unmount the NFS share %s"), export)
  ---------------------------------------------------------------------------------------------------------------
  * This code has been added in https://review.openstack.org/#/c/76558/.

  A VM instance is stopped once by creating VM image.
  And then detaching volume for the other VM instance on the same compute node is executed.
  If there are no VMs connecting cinder volumes, umounting NFS directory succeeds.
  After VM snapshot is completed, the VM instance is restarted.
  But the VM instance cannot access volumes because NFS directory has been umounted.
  So the error occurs and the VM instance cannot be restarted.

  And this issue also occurs under the race condition with starting a VM instance
  and detaching volumes for another VM instance('ACTIVE') on the same compute node.

  1. _connect_volume in starting a VM instance(mount NFS directory if not mounted.)
  2. _disconnect_volume in detaching volume(umount NFS directory if no processes use it.)
  3. The libvirt domain starts in starting a VM instance

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1421550/+subscriptions


Follow ups

References