yahoo-eng-team team mailing list archive

Thread
Date
[Bug 2035911] [NEW] Race conditions attaching/detaching volumes

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Gorka Eguileor <2035911@xxxxxxxxxxxxxxxxxx>
Date: Thu, 14 Sep 2023 13:59:21 -0000
Reply-to: Bug 2035911 <2035911@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

For cinder volume attach and detach operations Nova is properly using
os-brick's `guard_connection` as a context manager to protect against
race conditions the the local disconnection of the volume and the call
to cinder to unmap/unexport the volume with other volumes' local
connection and map/export.

This is the code:

    def detach(self, context, instance, volume_api, virt_driver,
               attachment_id=None, destroy_bdm=False):
        volume = self._get_volume(context, volume_api, self.volume_id)
        # Let OS-Brick handle high level locking that covers the local os-brick
        # detach and the Cinder call to call unmap the volume.  Not all volume
        # backends or hosts require locking.
        with brick_utils.guard_connection(volume):
            self._do_detach(context, instance, volume_api, virt_driver,
                            attachment_id, destroy_bdm)


    @update_db
    def attach(self, context, instance, volume_api, virt_driver,
               do_driver_attach=False, **kwargs):
        volume = self._get_volume(context, volume_api, self.volume_id)
        volume_api.check_availability_zone(context, volume,
                                           instance=instance)
        # Let OS-Brick handle high level locking that covers the call to
        # Cinder that exports & maps the volume, and for the local os-brick
        # attach.  Not all volume backends or hosts require locking.
        with brick_utils.guard_connection(volume):
            self._do_attach(context, instance, volume, volume_api,
                            virt_driver, do_driver_attach)

But there are many other places where Nova does attach or detach not
using those 2 methods (`detach` and `attach`).

One example is when we delete an instance that has cinder volumes
attached, another one is when finishing an instance live migration.

Nova needs to always use the `guard_connection` context manager.

There are places where it will be easy to fix such as the
`_remove_volume_connection` method in cinder/volume/manager.py.

But there are others where it looks like it will be harder like
`_shutdown_instance` in the same file, because the volumes are locally
detached through the `self.driver.destroy` call and only after all the
volumes (potentially from different backends) have been locally removed
does it call `self.volume_api.attachment_delete` for each of the
volumes.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2035911

Title:
  Race conditions attaching/detaching volumes

Status in OpenStack Compute (nova):
  New

Bug description:
  For cinder volume attach and detach operations Nova is properly using
  os-brick's `guard_connection` as a context manager to protect against
  race conditions the the local disconnection of the volume and the call
  to cinder to unmap/unexport the volume with other volumes' local
  connection and map/export.

  This is the code:

      def detach(self, context, instance, volume_api, virt_driver,
                 attachment_id=None, destroy_bdm=False):
          volume = self._get_volume(context, volume_api, self.volume_id)
          # Let OS-Brick handle high level locking that covers the local os-brick
          # detach and the Cinder call to call unmap the volume.  Not all volume
          # backends or hosts require locking.
          with brick_utils.guard_connection(volume):
              self._do_detach(context, instance, volume_api, virt_driver,
                              attachment_id, destroy_bdm)

  
      @update_db
      def attach(self, context, instance, volume_api, virt_driver,
                 do_driver_attach=False, **kwargs):
          volume = self._get_volume(context, volume_api, self.volume_id)
          volume_api.check_availability_zone(context, volume,
                                             instance=instance)
          # Let OS-Brick handle high level locking that covers the call to
          # Cinder that exports & maps the volume, and for the local os-brick
          # attach.  Not all volume backends or hosts require locking.
          with brick_utils.guard_connection(volume):
              self._do_attach(context, instance, volume, volume_api,
                              virt_driver, do_driver_attach)

  But there are many other places where Nova does attach or detach not
  using those 2 methods (`detach` and `attach`).

  One example is when we delete an instance that has cinder volumes
  attached, another one is when finishing an instance live migration.

  Nova needs to always use the `guard_connection` context manager.

  There are places where it will be easy to fix such as the
  `_remove_volume_connection` method in cinder/volume/manager.py.

  But there are others where it looks like it will be harder like
  `_shutdown_instance` in the same file, because the volumes are locally
  detached through the `self.driver.destroy` call and only after all the
  volumes (potentially from different backends) have been locally
  removed does it call `self.volume_api.attachment_delete` for each of
  the volumes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2035911/+subscriptions