yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1452840] Re: libvirt: nova's detach_volume silently fails sometimes

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Nicolas Simonds <1452840@xxxxxxxxxxxxxxxxxx>
Date: Fri, 08 May 2015 17:15:38 -0000
Reply-to: Bug 1452840 <1452840@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Also affects: libvirt-python
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1452840

Title:
  libvirt: nova's detach_volume silently fails sometimes

Status in libvirt-python:
  New
Status in OpenStack Compute (Nova):
  Confirmed

Bug description:
  This behavior has been observed on the following platforms:

  * Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse NFS driver, CirrOS 0.3.2 guest
  * Nova Icehouse, Debian 12.04, QEMU 1.5.3, libvirt 1.1.3.5, with the Cinder Icehouse RBD (Ceph) driver, CirrOS 0.3.2 guest
  * Nova master, Debian 14.04, QEMU 2.0.0, libvirt 1.2.2, with the Cinder master iSCSI driver, CirrOS 0.3.2 guest

  Nova's "detach_volume" fires the detach method into libvirt, which
  claims success, but the device is still attached according to "virsh
  domblklist".  Nova then finishes the teardown, releasing the
  resources, which then causes I/O errors in the guest, and subsequent
  volume_attach requests from Nova to fail spectacularly due to it
  trying to use an in-use resource.

  This appears to be a race condition, in that it does occasionally work
  fine.

  Steps to Reproduce:

  This script will usually trigger the error condition:

      #!/bin/bash -vx

      : Setup
      img=$(glance image-list --disk-format ami | awk '/cirros-0.3.2-x86_64-uec/ {print $2}')
      vol1_id=$(cinder create 1 | awk '($2=="id"){print $4}')
      sleep 5

      : Launch
      nova boot --flavor m1.tiny --image "$img" --block-device source=volume,id="$vol1_id",dest=volume,shutdown=preserve --poll test

      : Measure
      nova show test | grep "volumes_attached.*$vol1_id"

      : Poke the bear
      nova volume-detach test "$vol1_id"
      sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
      sleep 10
      sudo virsh list --all --uuid | xargs -r -n 1 sudo virsh domblklist
      vol2_id=$(cinder create 1 | awk '($2=="id"){print $4}')
      nova volume-attach test "$vol2_id"
      sleep 1

      : Measure again
      nova show test | grep "volumes_attached.*$vol2_id"

  Expected behavior:

  The volumes attach/detach/attach properly

  Actual behavior:

  The second attachment fails, and n-cpu throws the following exception:

      Failed to attach volume at mountpoint: /dev/vdb
      Traceback (most recent call last):
          File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1057, in attach_volume
           virt_dom.attachDeviceFlags(conf.to_xml(), flags)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
           result = proxy_call(self._autowrap, f, *args, **kwargs)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
           rv = execute(f, *args, **kwargs)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
           six.reraise(c, e, tb)
         File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
           rv = meth(*args, **kwargs)
         File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 517, in attachDeviceFlags
           if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
       libvirtError: operation failed: target vdb already exists

  Workaround:

  "sudo virsh detach-disk $SOME_UUID $SOME_DISK_ID" appears to cause the
  guest to properly detach the device, and also seems to ward off
  whatever gremlins caused the problem in the first place; i.e., the
  problem gets much less likely to present itself after firing a virsh
  command.

To manage notifications about this bug go to:
https://bugs.launchpad.net/libvirt-python/+bug/1452840/+subscriptions

References

[Bug 1452840] [NEW] libvirt: nova's detach_volume silently fails sometimes
From: Nicolas Simonds, 2015-05-07