yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #59250
[Bug 1642689] Re: ceph: volume detach fails with "libvirtError: operation failed: disk vdb not found"
Reviewed: https://review.openstack.org/401375
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d079f37f7ccb8da9e74416cedce4f194b66ee1d0
Submitter: Jenkins
Branch: master
commit d079f37f7ccb8da9e74416cedce4f194b66ee1d0
Author: int32bit <krystism@xxxxxxxxx>
Date: Wed Nov 23 18:21:16 2016 +0800
Fix wait for detach code to handle 'disk not found error'
Currently, the code does an initial detach from the persistent and
transient domains in one call, then follows up with a call of the
retryable function, which first checks the domain xml before retrying
the transient domain detach. If the transient domain detach (which is
asynchronous) completes after we checked the domain xml, trying to
detach it again will raise a LibvirtError exception. Then, our loop
code in `_do_wait_and_retry_detach` doesn't catch it and will respond
with error. We should be handling the `disk not found error` from
libvirt and consider the job complete.
Closes-Bug: #1642689
Change-Id: I131aaf28d2f5d5d964d4045e3d7d62207079cfb0
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1642689
Title:
ceph: volume detach fails with "libvirtError: operation failed: disk
vdb not found"
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) newton series:
In Progress
Bug description:
Seeing this failure in this job:
tempest.api.volume.test_volumes_snapshots.VolumesV1SnapshotTestJSON.test_snapshot_create_with_volume_in_use[compute
,id-b467b54c-07a4-446d-a1cf-651dedcc3ff1]
While detaching a volume on cleanup it times out waiting for the
volume status to go from 'in-use' back to 'available':
2016-11-17 11:58:59.110301 | Captured traceback:
2016-11-17 11:58:59.110310 | ~~~~~~~~~~~~~~~~~~~
2016-11-17 11:58:59.110322 | Traceback (most recent call last):
2016-11-17 11:58:59.110342 | File "tempest/common/waiters.py", line 189, in wait_for_volume_status
2016-11-17 11:58:59.110356 | raise lib_exc.TimeoutException(message)
2016-11-17 11:58:59.110373 | tempest.lib.exceptions.TimeoutException: Request timed out
2016-11-17 11:58:59.110405 | Details: Volume db12eda4-4ce6-4f00-a4e0-9df115f230e5 failed to reach available status (current in-use) within the required time (196 s).
The volume detach request is here:
2016-11-17 11:58:59.031058 | 2016-11-17 11:38:55,018 8316 INFO [tempest.lib.common.rest_client] Request (VolumesV1SnapshotTestJSON:_run_cleanups): 202 DELETE http://127.0.0.1:8774/v2.1/servers/584a65b5-07fa-4994-a2d5-1676d0e13a8c/os-volume_attachments/db12eda4-4ce6-4f00-a4e0-9df115f230e5 0.277s
2016-11-17 11:58:59.031103 | 2016-11-17 11:38:55,018 8316 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'X-Auth-Token': '<omitted>', 'Accept': 'application/json', 'Content-Type': 'application/json'}
2016-11-17 11:58:59.031113 | Body: None
2016-11-17 11:58:59.031235 | Response - Headers: {'content-location': 'http://127.0.0.1:8774/v2.1/servers/584a65b5-07fa-4994-a2d5-1676d0e13a8c/os-volume_attachments/db12eda4-4ce6-4f00-a4e0-9df115f230e5', 'content-type': 'application/json', 'x-openstack-nova-api-version': '2.1', 'date': 'Thu, 17 Nov 2016 11:38:55 GMT', 'content-length': '0', 'status': '202', 'connection': 'close', 'x-compute-request-id': 'req-9f0541d3-6eec-4793-8852-7bd01708932e', 'openstack-api-version': 'compute 2.1', 'vary': 'X-OpenStack-Nova-API-Version'}
2016-11-17 11:58:59.031248 | Body:
Following the req-9f0541d3-6eec-4793-8852-7bd01708932e request ID to
the compute logs we see this detach failure:
http://logs.openstack.org/00/398800/1/gate/gate-tempest-dsvm-full-
devstack-plugin-ceph-ubuntu-
xenial/a387fb0/logs/screen-n-cpu.txt.gz?level=TRACE#_2016-11-17_11_39_00_649
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [req-9f0541d3-6eec-4793-8852-7bd01708932e tempest-VolumesV1SnapshotTestJSON-1819335716 tempest-VolumesV1SnapshotTestJSON-1819335716] [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] Failed to detach volume db12eda4-4ce6-4f00-a4e0-9df115f230e5 from /dev/vdb
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] Traceback (most recent call last):
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/compute/manager.py", line 4757, in _driver_detach_volume
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] encryption=encryption)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1307, in detach_volume
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] wait_for_detach()
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 385, in func
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] return evt.wait()
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 121, in wait
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] return hubs.get_hub().switch()
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] return self.greenlet.switch()
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 136, in _run_loop
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] result = func(*self.args, **self.kw)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 356, in _func
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] result = f(*args, **kwargs)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 380, in _do_wait_and_retry_detach
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] self.detach_device(config, persistent=False, live=live)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 401, in detach_device
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] self._domain.detachDeviceFlags(device_xml, flags=flags)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] result = proxy_call(self._autowrap, f, *args, **kwargs)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] rv = execute(f, *args, **kwargs)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] six.reraise(c, e, tb)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] rv = meth(*args, **kwargs)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1190, in detachDeviceFlags
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] if ret == -1: raise libvirtError ('virDomainDetachDeviceFlags() failed', dom=self)
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c] libvirtError: operation failed: disk vdb not found
2016-11-17 11:39:00.649 2249 ERROR nova.compute.manager [instance: 584a65b5-07fa-4994-a2d5-1676d0e13a8c]
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20detach%20volume%5C%22%20AND%20message%3A%5C%22_do_wait_and_retry_detach%5C%22%20AND%20message%3A%5C%22libvirtError%3A%20operation%20failed%3A%20disk%5C%22%20AND%20message%3A%5C%22not%20found%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d
21 hits in the last 7 days, check and gate, all failures, 90% in ceph
jobs.
Looks like that wait for detach code should be handling the not found
from libvirt and considering the job complete.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1642689/+subscriptions
References