yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #32714
[Bug 1452840] [NEW] libvirt: nova's detach_volume silently fails sometimes
Public bug reported:
This behavior has been observed on the following platforms:
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest
Nova's "detach_volume" fires the detach method into libvirt, which
claims success, but the device is still attached according to "virsh
domblklist". Nova then finishes the teardown, releasing the resources,
which then causes
This appears to be a race condition, in that it does occasionally work
fine.
Steps to Reproduce:
This script will usually trigger the error condition:
#!/bin/bash -vx
: Setup
img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
sleep 5
: Launch
nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
: Measure
nova show test | grep "volumes_attached.*$vol1_id"
: Poke the bear
nova volume-detach test "$vol1_id"
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
sleep 10
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
nova volume-attach test "$vol2_id"
sleep 1
: Measure again
nova show test | grep "volumes_attached.*$vol2_id"
Expected behavior:
The volumes attach/detach/attach properly
Actual behavior:
The second attachment fails, and n-cpu throws the following exception:
Failed to attach volume at mountpoint: /dev/vdb
Traceback (most recent call last):
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
virt_dom.attachDeviceFlags(conf.to_xml(), flags)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
result = proxy_call(self._autowrap, f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
rv = execute(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
six.reraise(c, e, tb)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
rv = meth(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
libvirtError: operation failed: target vdb already exists
Workaround:
"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the
guest to properly detach the device, and also seems to ward off whatever
gremlins caused the problem in the first place; i.e., the problem gets
much less likely to present itself after firing a virsh command.
** Affects: nova
Importance: Undecided
Status: New
** Tags: libvirt volumes
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1452840
Title:
libvirt: nova's detach_volume silently fails sometimes
Status in OpenStack Compute (Nova):
New
Bug description:
This behavior has been observed on the following platforms:
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest
Nova's "detach_volume" fires the detach method into libvirt, which
claims success, but the device is still attached according to "virsh
domblklist". Nova then finishes the teardown, releasing the
resources, which then causes
This appears to be a race condition, in that it does occasionally work
fine.
Steps to Reproduce:
This script will usually trigger the error condition:
#!/bin/bash -vx
: Setup
img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
sleep 5
: Launch
nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
: Measure
nova show test | grep "volumes_attached.*$vol1_id"
: Poke the bear
nova volume-detach test "$vol1_id"
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
sleep 10
sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
nova volume-attach test "$vol2_id"
sleep 1
: Measure again
nova show test | grep "volumes_attached.*$vol2_id"
Expected behavior:
The volumes attach/detach/attach properly
Actual behavior:
The second attachment fails, and n-cpu throws the following exception:
Failed to attach volume at mountpoint: /dev/vdb
Traceback (most recent call last):
File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
virt_dom.attachDeviceFlags(conf.to_xml(), flags)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
result = proxy_call(self._autowrap, f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
rv = execute(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
six.reraise(c, e, tb)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
rv = meth(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
libvirtError: operation failed: target vdb already exists
Workaround:
"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the
guest to properly detach the device, and also seems to ward off
whatever gremlins caused the problem in the first place; i.e., the
problem gets much less likely to present itself after firing a virsh
command.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1452840/+subscriptions
Follow ups
References