← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1836212] Re: libvirt: Failure to recover from failed detach

 

Yep. The actual error thrown was "Unable to detach from guest transient
domain.", which is now "Unable to detach the device from the live
config." in master. That RetryDecorator makes this function a whole lot
harder to read, but with your explanation it seems that the detach was
actually timing out, which is consistent with the underlying problem we
eventually discovered.

Thanks! I'll close this out.

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1836212

Title:
  libvirt: Failure to recover from failed detach

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  1020162 ERROR root [req-46fbc6c8-de2c-4afb-9f24-9d75947c9a3c
  9ccddbb72e2d42b6ab1a31ad48ea21fb 86bea4eb057b412a98402a1b7e1d9222 - -
  -] Original exception being dropped: ['Traceback (most recent call
  last):\n', '  File "/usr/lib/python2.7/site-
  packages/nova/virt/libvirt/guest.py", line 390, in
  _try_detach_device\n    self.detach_device(conf,
  persistent=persistent, live=live)\n', '  File "/usr/lib/python2.7
  /site-packages/nova/virt/libvirt/guest.py", line 467, in
  detach_device\n    self._domain.detachDeviceFlags(device_xml,
  flags=flags)\n', '  File "/usr/lib/python2.7/site-
  packages/eventlet/tpool.py", line 186, in doit\n    result =
  proxy_call(self._autowrap, f, *args, **kwargs)\n', '  File
  "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in
  proxy_call\n    rv = execute(f, *args, **kwargs)\n', '  File
  "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in
  execute\n    six.reraise(c, e, tb)\n', '  File "/usr/lib/python2.7
  /site-packages/eventlet/tpool.py", line 83, in tworker\n    rv =
  meth(*args, **kwargs)\n', '  File "/usr/lib64/python2.7/site-
  packages/libvirt.py", line 1194, in detachDeviceFlags\n    if ret ==
  -1: raise libvirtError (\'virDomainDetachDeviceFlags() failed\',
  dom=self)\n', 'libvirtError: invalid argument: no target device
  vdb\n']

  This appears to happen because when we call
  detach_device_with_retry(live=True) we ultimately call
  detachDeviceFlags(flags=VIR_DOMAIN_AFFECT_CONFIG |
  VIR_DOMAIN_AFFECT_LIVE). 'no target device' is the error generated
  when libvirt failed to remove the device from CONFIG (persistent).
  This can happen because
  detachDeviceFlags(flags=VIR_DOMAIN_AFFECT_CONFIG |
  VIR_DOMAIN_AFFECT_LIVE) will succeed and remove the device from the
  CONFIG domain as long as the LIVE domain removal was queued, even
  though this is an asynchronous operation. Consequently, a subsequent
  check for the device may return the device because it hasn't yet been
  (and may never be) removed from the LIVE domain, but it has been
  removed from the CONFIG domain. This will prevent libvirt from
  attempting to remove the device from the LIVE domain, and so the
  detach will never succeed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1836212/+subscriptions


References