← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1989232] Re: MultiAttachVolumeSwap fails or takes a long time to detach volume

 

Thanks Aboubacar for the well written bug report.

I agree that we have a race between the disconnect and the swap
operation. Both uses a lock but they use different locks so they can
overlap.

Failed case:

Sep 30 02:51:47  Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 0.156s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Sep 30 02:51:55  Lock "0d711b7b-4693-4a7e-9a94-ca4186b4a670" "released" by "nova.compute.manager.ComputeManager.swap_volume.<locals>._do_locked_swap_volume" :: held 153.400s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}

Successful case:

Sep 29 17:12:21 Lock "4aeaeb5d-295f-4149-9330-a016d9da1730" "released" by "nova.compute.manager.ComputeManager.swap_volume.<locals>._do_locked_swap_volume" :: held 632.783s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}
Sep 29 17:12:25 Lock "connect_volume" "released" by "os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume" :: held 0.142s {{(pid=2571444) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:400}}


** Also affects: os-brick
   Importance: Undecided
       Status: New

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Tags added: volumes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1989232

Title:
  MultiAttachVolumeSwap fails or takes a long time to detach volume

Status in OpenStack Compute (nova):
  Triaged
Status in os-brick:
  New
Status in tempest:
  New

Bug description:
  ERROR:
  tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach
  fails during tempest iSCSI tests due to volume taking a long time to
  detach or failing to detach from instance.  The logs herein show an
  example of a failure to detach.

  EXEPCTED BEHAVIOR: Volume successfully detaches and test passes.

  HOW TO DUPLICATE:
  Run: tox -e all -- tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach | tee -a console.log.out

  CONFIG:
  - DevStack Zed Release
  - Single node using iSCSI
  - Host OS: Ubuntu 20.04
  Distributor ID: Ubuntu
  Description:    Ubuntu 20.04.3 LTS
  Release:        20.04
  Codename:       focal

  From tempest console.log:

  tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach[id-e8f
  8f9d1-d7b7-4cd2-8213-ab85ef697b6e,slow,volume]
  -------------------------------------------------------------------------------------------------------------
  ----------------------------------------------

  Captured traceback:
  ~~~~~~~~~~~~~~~~~~~
      Traceback (most recent call last):

        File "/opt/stack/tempest/tempest/lib/decorators.py", line 81, in wrapper
      return f(*func_args, **func_kwargs)

        File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
      return f(*func_args, **func_kwargs)

        File "/opt/stack/tempest/tempest/api/compute/admin/test_volume_swap.py", line 245, in test_volume_swap_
  with_multiattach
      waiters.wait_for_volume_resource_status(self.volumes_client,

        File "/opt/stack/tempest/tempest/common/waiters.py", line 301, in wait_for_volume_resource_status
      time.sleep(client.build_interval)

        File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
      raise TimeoutException()

      fixtures._fixtures.timeout.TimeoutException

  Captured traceback-1:
  ~~~~~~~~~~~~~~~~~~~~~
      Traceback (most recent call last):

        File "/opt/stack/tempest/tempest/common/waiters.py", line 385, in wait_for_volume_attachment_remove_from_server
      raise lib_exc.TimeoutException(message)

      tempest.lib.exceptions.TimeoutException: Request timed out
  Details: Volume a54c67b7-786e-4ba7-94ea-d1e0a722424a failed to detach from server 986b2dd5-542a-4344-a929-9ac7bbf35d7c within the required time (3600 s) from the compute API perspective

  In waiters.py:

      373     while any(volume for volume in volumes if volume['volumeId'] == volume_id):
      374         time.sleep(client.build_interval)
      375
      376         timed_out = int(time.time()) - start >= client.build_timeout
      377         if timed_out:
      378             console_output = client.get_console_output(server_id)['output']
      379             LOG.debug('Console output for %s\nbody=\n%s',
      380                       server_id, console_output)
      381             message = ('Volume %s failed to detach from server %s within '
      382                        'the required time (%s s) from the compute API '
      383                        'perspective' %
      384                        (volume_id, server_id, client.build_timeout))
      385             raise lib_exc.TimeoutException(message)
      386         try:
      387             volumes = client.list_volume_attachments(
      388                 server_id)['volumeAttachments']
      389         except lib_exc.NotFound:
      390             # Ignore 404s on detach in case the server is deleted or the volume
      391             # is already detached.
      392             return
      393     return

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1989232/+subscriptions