← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1968944] [NEW] Concurrent migration of vms with the same multiattach volume fails

 

Public bug reported:

reproduce:
1. Create multiple vms
2. Create a multiattach volume
3. Attach the volume to all vms
4. Shut down all vms and migrate all vms at the same time
5. It is possible to find that a vm migration failed

The nova-compute log is as follows:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [req-95d6268a-95eb-4ea2-98e0-a9e973b8f19c cb6c975e503c4b1ca741f64a42d09d50 68dd5eeecb434da0aa5ebcdda19a8db6 - default default] [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Setting instance vm_state to ERROR: nova.exception.InvalidInput: Invalid input received: Invalid volume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Traceback (most recent call last):
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 396, in wrapper
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 432, in wrapper
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, volume_id, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 807, in attachment_create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] instance_uuid=instance_id)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] self.force_reraise()
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise self.value
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 795, in attachment_create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] volume_id, _connector, instance_id)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/api_versions.py", line 423, in substitution
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return method.func(obj, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/v3/attachments.py", line 39, in create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] retval = self._create('/attachments', body, 'attachment')
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/base.py", line 300, in _create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] resp, body = self.api.client.post(url, body=body)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 217, in post
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self._cs_request(url, 'POST', **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 205, in _cs_request
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self.request(url, method, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 191, in request
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise exceptions.from_response(resp, body)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] cinderclient.exceptions.BadRequest: Invalidvolume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] During handling of the above exception, another exception occurred:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]

vm migration Process:
1. Call cinder's attachment_create function on the source node.Set the
 multiattach volume status to reserved.
2. The vm performs the migration operation
3. Call cinder attachment_update function on the destination node, update the connection_info information in the attachment record, and set the status of the volume to attaching
4. Call cinder attachment_complete function on the destination node and set the volume status to in-use

The reason for the failure of vm migration is that the multiattach
volume status changes to attaching after the 3 step of the vm migration
process. At this time, when another vm migrates to the 1 step, it is
judged that the volume status is attaching, which leads to the execution
of attachment_create fail.

def _attachment_reserve(self, ctxt, vref, instance_uuid=None):
        # NOTE(jdg): Reserved is a special case, we're avoiding allowing
        # creation of other new reserves/attachments while in this state
        # so we avoid contention issues with shared connections

        # Multiattach of bootable volumes is a special case with it's own
        # policy, check that here right off the bat
        if (vref.get('multiattach', False) and
                vref.status == 'in-use' and
                vref.bootable):
            ctxt.authorize(
                attachment_policy.MULTIATTACH_BOOTABLE_VOLUME_POLICY,
                target_obj=vref)

        # FIXME(JDG): We want to be able to do things here like reserve a
        # volume for Nova to do BFV WHILE the volume may be in the process of
        # downloading image, we add downloading here; that's easy enough but
        # we've got a race between with the attaching/detaching that we do
        # locally on the Cinder node. Just come up with an easy way to
        # determine if we're attaching to the Cinder host for some work or if
        # we're being used by the outside world.
        expected = {'multiattach': vref.multiattach,
                    'status': (('available', 'in-use', 'downloading')
                               if vref.multiattach
                               else ('available', 'downloading'))}

        result = vref.conditional_update({'status': 'reserved'},
expected)

        if not result:
            override = False
            if instance_uuid and vref.status in ('in-use', 'reserved'):
                # Refresh the volume reference in case multiple instances were
                # being concurrently attached to the same non-multiattach
                # volume.
                vref = objects.Volume.get_by_id(ctxt, vref.id)
                for attachment in vref.volume_attachment:
                    # If we're attaching the same volume to the same instance,
                    # we could be migrating the instance to another host in
                    # which case we want to allow the reservation.
                    # (LP BUG: 1694530)
                    if attachment.instance_uuid == instance_uuid:
                        override = True
                        break

            if not override:
                msg = (_('Volume %(vol_id)s status must be %(statuses)s to '
                         'reserve, but the current status is %(current)s.') %
                       {'vol_id': vref.id,
                        'statuses': utils.build_or_str(expected['status']),
                        'current': vref.status})
                raise exception.InvalidVolume(reason=msg)

        values = {'volume_id': vref.id,
                  'volume_host': vref.host,
                  'attach_status': 'reserved',
                  'instance_uuid': instance_uuid}
        db_ref = self.db.volume_attach(ctxt.elevated(), values)
        return objects.VolumeAttachment.get_by_id(ctxt, db_ref['id'])

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: migration multiattach

** Tags added: migration multiattach

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1968944

Title:
  Concurrent migration of vms with the same multiattach volume fails

Status in OpenStack Compute (nova):
  New

Bug description:
  reproduce:
  1. Create multiple vms
  2. Create a multiattach volume
  3. Attach the volume to all vms
  4. Shut down all vms and migrate all vms at the same time
  5. It is possible to find that a vm migration failed

  The nova-compute log is as follows:
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [req-95d6268a-95eb-4ea2-98e0-a9e973b8f19c cb6c975e503c4b1ca741f64a42d09d50 68dd5eeecb434da0aa5ebcdda19a8db6 - default default] [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Setting instance vm_state to ERROR: nova.exception.InvalidInput: Invalid input received: Invalid volume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Traceback (most recent call last):
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 396, in wrapper
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, *args, **kwargs)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 432, in wrapper
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, volume_id, *args, **kwargs)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 807, in attachment_create
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] instance_uuid=instance_id)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] self.force_reraise()
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise self.value
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 795, in attachment_create
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] volume_id, _connector, instance_id)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/api_versions.py", line 423, in substitution
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return method.func(obj, *args, **kwargs)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/v3/attachments.py", line 39, in create
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] retval = self._create('/attachments', body, 'attachment')
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/base.py", line 300, in _create
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] resp, body = self.api.client.post(url, body=body)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 217, in post
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self._cs_request(url, 'POST', **kwargs)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 205, in _cs_request
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self.request(url, method, **kwargs)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 191, in request
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise exceptions.from_response(resp, body)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] cinderclient.exceptions.BadRequest: Invalidvolume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] During handling of the above exception, another exception occurred:
  2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]

  vm migration Process:
  1. Call cinder's attachment_create function on the source node.Set the
   multiattach volume status to reserved.
  2. The vm performs the migration operation
  3. Call cinder attachment_update function on the destination node, update the connection_info information in the attachment record, and set the status of the volume to attaching
  4. Call cinder attachment_complete function on the destination node and set the volume status to in-use

  The reason for the failure of vm migration is that the multiattach
  volume status changes to attaching after the 3 step of the vm
  migration process. At this time, when another vm migrates to the 1
  step, it is judged that the volume status is attaching, which leads to
  the execution of attachment_create fail.

  def _attachment_reserve(self, ctxt, vref, instance_uuid=None):
          # NOTE(jdg): Reserved is a special case, we're avoiding allowing
          # creation of other new reserves/attachments while in this state
          # so we avoid contention issues with shared connections

          # Multiattach of bootable volumes is a special case with it's own
          # policy, check that here right off the bat
          if (vref.get('multiattach', False) and
                  vref.status == 'in-use' and
                  vref.bootable):
              ctxt.authorize(
                  attachment_policy.MULTIATTACH_BOOTABLE_VOLUME_POLICY,
                  target_obj=vref)

          # FIXME(JDG): We want to be able to do things here like reserve a
          # volume for Nova to do BFV WHILE the volume may be in the process of
          # downloading image, we add downloading here; that's easy enough but
          # we've got a race between with the attaching/detaching that we do
          # locally on the Cinder node. Just come up with an easy way to
          # determine if we're attaching to the Cinder host for some work or if
          # we're being used by the outside world.
          expected = {'multiattach': vref.multiattach,
                      'status': (('available', 'in-use', 'downloading')
                                 if vref.multiattach
                                 else ('available', 'downloading'))}

          result = vref.conditional_update({'status': 'reserved'},
  expected)

          if not result:
              override = False
              if instance_uuid and vref.status in ('in-use', 'reserved'):
                  # Refresh the volume reference in case multiple instances were
                  # being concurrently attached to the same non-multiattach
                  # volume.
                  vref = objects.Volume.get_by_id(ctxt, vref.id)
                  for attachment in vref.volume_attachment:
                      # If we're attaching the same volume to the same instance,
                      # we could be migrating the instance to another host in
                      # which case we want to allow the reservation.
                      # (LP BUG: 1694530)
                      if attachment.instance_uuid == instance_uuid:
                          override = True
                          break

              if not override:
                  msg = (_('Volume %(vol_id)s status must be %(statuses)s to '
                           'reserve, but the current status is %(current)s.') %
                         {'vol_id': vref.id,
                          'statuses': utils.build_or_str(expected['status']),
                          'current': vref.status})
                  raise exception.InvalidVolume(reason=msg)

          values = {'volume_id': vref.id,
                    'volume_host': vref.host,
                    'attach_status': 'reserved',
                    'instance_uuid': instance_uuid}
          db_ref = self.db.volume_attach(ctxt.elevated(), values)
          return objects.VolumeAttachment.get_by_id(ctxt, db_ref['id'])

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1968944/+subscriptions