yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #32738
[Bug 1452840] Re: libvirt: nova's detach_volume silently fails sometimes
** Also affects: libvirt-python
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1452840
Title:
libvirt: nova's detach_volume silently fails sometimes
Status in libvirt-python:
New
Status in OpenStack Compute (Nova):
Confirmed
Bug description:
This behavior has been observed on the following platforms:
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest
Nova's "detach_volume" fires the detach method into libvirt, which
claims success, but the device is still attached according to "virsh
domblklist". Nova then finishes the teardown, releasing the
resources, which then causes I/O errors in the guest, and subsequent
volume_attach requests from Nova to fail spectacularly due to it
trying to use an in-use resource.
This appears to be a race condition, in that it does occasionally work
fine.
Steps to Reproduce:
This script will usually trigger the error condition:
#!/bin/bash -vx
: Setup
img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
sleep 5
: Launch
nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
: Measure
nova show test | grep "volumes_attached.*$vol1_id"
: Poke the bear
nova volume-detach test "$vol1_id"
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
sleep 10
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
nova volume-attach test "$vol2_id"
sleep 1
: Measure again
nova show test | grep "volumes_attached.*$vol2_id"
Expected behavior:
The volumes attach/detach/attach properly
Actual behavior:
The second attachment fails, and n-cpu throws the following exception:
Failed to attach volume at mountpoint: /dev/vdb
Traceback (most recent call last):
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
virt_dom.attachDeviceFlags(conf.to_xml(), flags)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
result = proxy_call(self._autowrap, f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
rv = execute(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
six.reraise(c, e, tb)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
rv = meth(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
libvirtError: operation failed: target vdb already exists
Workaround:
"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the
guest to properly detach the device, and also seems to ward off
whatever gremlins caused the problem in the first place; i.e., the
problem gets much less likely to present itself after firing a virsh
command.
To manage notifications about this bug go to:
https://bugs.launchpad.net/libvirt-python/+bug/1452840/+subscriptions
References