← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1452840] [NEW] libvirt: nova's detach_volume silently fails sometimes

 

Public bug reported:

This behavior has been observed on the following platforms:

* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
* Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
* Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest

Nova's "detach_volume" fires the detach method into libvirt, which
claims success, but the device is still attached according to "virsh
domblklist".  Nova then finishes the teardown, releasing the resources,
which then causes

This appears to be a race condition, in that it does occasionally work
fine.

Steps to Reproduce:

This script will usually trigger the error condition:

    #!/bin/bash -vx
    
    : Setup
    img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
    vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    sleep 5
    
    : Launch
    nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
    
    : Measure
    nova show test | grep "volumes_attached.*$vol1_id"
    
    : Poke the bear
    nova volume-detach test "$vol1_id"
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    sleep 10
    sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
    vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
    nova volume-attach test "$vol2_id"
    sleep 1
    
    : Measure again
    nova show test | grep "volumes_attached.*$vol2_id"

Expected behavior:

The volumes attach/detach/attach properly

Actual behavior:

The second attachment fails, and n-cpu throws the following exception:

    Failed to attach volume at mountpoint: /dev/vdb
    Traceback (most recent call last):
        File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
         virt_dom.attachDeviceFlags(conf.to_xml(), flags)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
         result = proxy_call(self._autowrap, f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
         rv = execute(f, *args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
         six.reraise(c, e, tb)
       File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
         rv = meth(*args, **kwargs)
       File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
         if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
     libvirtError: operation failed: target vdb already exists

Workaround:

"sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the
guest to properly detach the device, and also seems to ward off whatever
gremlins caused the problem in the first place; i.e., the problem gets
much less likely to present itself after firing a virsh command.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: libvirt volumes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1452840

Title:
  libvirt: nova's detach_volume silently fails sometimes

Status in OpenStack Compute (Nova):
  New

Bug description:
  This behavior has been observed on the following platforms:

  * Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
  * Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
  * Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest

  Nova's "detach_volume" fires the detach method into libvirt, which
  claims success, but the device is still attached according to "virsh
  domblklist".  Nova then finishes the teardown, releasing the
  resources, which then causes

  This appears to be a race condition, in that it does occasionally work
  fine.

  Steps to Reproduce:

  This script will usually trigger the error condition:

      #!/bin/bash -vx
      
      : Setup
      img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
      vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
      sleep 5
      
      : Launch
      nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test
      
      : Measure
      nova show test | grep "volumes_attached.*$vol1_id"
      
      : Poke the bear
      nova volume-detach test "$vol1_id"
      sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
      sleep 10
      sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
      vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
      nova volume-attach test "$vol2_id"
      sleep 1
      
      : Measure again
      nova show test | grep "volumes_attached.*$vol2_id"

  Expected behavior:

  The volumes attach/detach/attach properly

  Actual behavior:

  The second attachment fails, and n-cpu throws the following exception:

      Failed to attach volume at mountpoint: /dev/vdb
      Traceback (most recent call last):
          File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
           virt_dom.attachDeviceFlags(conf.to_xml(), flags)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
           result = proxy_call(self._autowrap, f, *args, **kwargs)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
           rv = execute(f, *args, **kwargs)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
           six.reraise(c, e, tb)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
           rv = meth(*args, **kwargs)
         File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
           if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
       libvirtError: operation failed: target vdb already exists

  Workaround:

  "sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the
  guest to properly detach the device, and also seems to ward off
  whatever gremlins caused the problem in the first place; i.e., the
  problem gets much less likely to present itself after firing a virsh
  command.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1452840/+subscriptions


Follow ups

References