yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90086
[Bug 1989232] Re: MultiAttachVolumeSwap fails or takes a long time to detach volume
Thanks Aboubacar for the well written bug report.
I agree that we have a race between the disconnect and the swap
operation. Both uses a lock but they use different locks so they can
overlap.
Failed case:
Sep 30 02:51:47 Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 0.156s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Sep 30 02:51:55 Lock "0d711b7b-4693-4a7e-9a94-ca4186b4a670" "released" by "nova.compute.manager.ComputeManager.swap_volume.<locals>._do_locked_swap_volume" :: held 153.400s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Successful case:
Sep 29 17:12:21 Lock "4aeaeb5d-295f-4149-9330-a016d9da1730" "released" by "nova.compute.manager.ComputeManager.swap_volume.<locals>._do_locked_swap_volume" :: held 632.783s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Sep 29 17:12:25 Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 0.142s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
** Also affects: os-brick
Importance: Undecided
Status: New
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => Medium
** Tags added: volumes
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1989232
Title:
MultiAttachVolumeSwap fails or takes a long time to detach volume
Status in OpenStack Compute (nova):
Triaged
Status in os-brick:
New
Status in tempest:
New
Bug description:
ERROR:
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach
fails during tempest iSCSI tests due to volume taking a long time to
detach or failing to detach from instance. The logs herein show an
example of a failure to detach.
EXEPCTED BEHAVIOR: Volume successfully detaches and test passes.
HOW TO DUPLICATE:
Run: tox -e all -- tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach | tee -a console.log.out
CONFIG:
- DevStack Zed Release
- Single node using iSCSI
- Host OS: Ubuntu 20.04
Distributor ID: Ubuntu
Description: Ubuntu 20.04.3 LTS
Release: 20.04
Codename: focal
From tempest console.log:
tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach[id-e8f
8f9d1-d7b7-4cd2-8213-ab85ef697b6e,slow,volume]
-------------------------------------------------------------------------------------------------------------
----------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/lib/decorators.py", line 81, in wrapper
return f(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
return f(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 245, in test_volume_swap_
with_multiattach
waiters.wait_for_volume_resource_status(self.volumes_client,
File "/opt/stack/tempest/tempest/common/waiters.py", line 301, in wait_for_volume_resource_status
time.sleep(client.build_interval)
File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
raise TimeoutException()
fixtures._fixtures.timeout.TimeoutException
Captured traceback-1:
~~~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/waiters.py", line 385, in wait_for_volume_attachment_remove_from_server
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Volume a54c67b7-786e-4ba7-94ea-d1e0a722424a failed to detach from server 986b2dd5-542a-4344-a929-9ac7bbf35d7c within the required time (3600 s) from the compute API perspective
In waiters.py:
373 while any(volume for volume in volumes if volume['volumeId'] == volume_id):
374 time.sleep(client.build_interval)
375
376 timed_out = int(time.time()) - start >= client.build_timeout
377 if timed_out:
378 console_output = client.get_console_output(server_id)['output']
379 LOG.debug('Console output for %s\nbody=\n%s',
380 server_id, console_output)
381 message = ('Volume %s failed to detach from server %s within '
382 'the required time (%s s) from the compute API '
383 'perspective' %
384 (volume_id, server_id, client.build_timeout))
385 raise lib_exc.TimeoutException(message)
386 try:
387 volumes = client.list_volume_attachments(
388 server_id)['volumeAttachments']
389 except lib_exc.NotFound:
390 # Ignore 404s on detach in case the server is deleted or the volume
391 # is already detached.
392 return
393 return
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1989232/+subscriptions