yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88669
[Bug 1968944] [NEW] Concurrent migration of vms with the same multiattach volume fails
Public bug reported:
reproduce:
1. Create multiple vms
2. Create a multiattach volume
3. Attach the volume to all vms
4. Shut down all vms and migrate all vms at the same time
5. It is possible to find that a vm migration failed
The nova-compute log is as follows:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [req-95d6268a-95eb-4ea2-98e0-a9e973b8f19c cb6c975e503c4b1ca741f64a42d09d50 68dd5eeecb434da0aa5ebcdda19a8db6 - default default] [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Setting instance vm_state to ERROR: nova.exception.InvalidInput: Invalid input received: Invalid volume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Traceback (most recent call last):
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 396, in wrapper
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 432, in wrapper
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, volume_id, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 807, in attachment_create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] instance_uuid=instance_id)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] self.force_reraise()
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise self.value
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 795, in attachment_create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] volume_id, _connector, instance_id)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/api_versions.py", line 423, in substitution
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return method.func(obj, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/v3/attachments.py", line 39, in create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] retval = self._create('/attachments', body, 'attachment')
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/base.py", line 300, in _create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] resp, body = self.api.client.post(url, body=body)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 217, in post
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self._cs_request(url, 'POST', **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 205, in _cs_request
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self.request(url, method, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 191, in request
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise exceptions.from_response(resp, body)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] cinderclient.exceptions.BadRequest: Invalidvolume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] During handling of the above exception, another exception occurred:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]
vm migration Process:
1. Call cinder's attachment_create function on the source node.Set the
multiattach volume status to reserved.
2. The vm performs the migration operation
3. Call cinder attachment_update function on the destination node, update the connection_info information in the attachment record, and set the status of the volume to attaching
4. Call cinder attachment_complete function on the destination node and set the volume status to in-use
The reason for the failure of vm migration is that the multiattach
volume status changes to attaching after the 3 step of the vm migration
process. At this time, when another vm migrates to the 1 step, it is
judged that the volume status is attaching, which leads to the execution
of attachment_create fail.
def _attachment_reserve(self, ctxt, vref, instance_uuid=None):
# NOTE(jdg): Reserved is a special case, we're avoiding allowing
# creation of other new reserves/attachments while in this state
# so we avoid contention issues with shared connections
# Multiattach of bootable volumes is a special case with it's own
# policy, check that here right off the bat
if (vref.get('multiattach', False) and
vref.status == 'in-use' and
vref.bootable):
ctxt.authorize(
attachment_policy.MULTIATTACH_BOOTABLE_VOLUME_POLICY,
target_obj=vref)
# FIXME(JDG): We want to be able to do things here like reserve a
# volume for Nova to do BFV WHILE the volume may be in the process of
# downloading image, we add downloading here; that's easy enough but
# we've got a race between with the attaching/detaching that we do
# locally on the Cinder node. Just come up with an easy way to
# determine if we're attaching to the Cinder host for some work or if
# we're being used by the outside world.
expected = {'multiattach': vref.multiattach,
'status': (('available', 'in-use', 'downloading')
if vref.multiattach
else ('available', 'downloading'))}
result = vref.conditional_update({'status': 'reserved'},
expected)
if not result:
override = False
if instance_uuid and vref.status in ('in-use', 'reserved'):
# Refresh the volume reference in case multiple instances were
# being concurrently attached to the same non-multiattach
# volume.
vref = objects.Volume.get_by_id(ctxt, vref.id)
for attachment in vref.volume_attachment:
# If we're attaching the same volume to the same instance,
# we could be migrating the instance to another host in
# which case we want to allow the reservation.
# (LP BUG: 1694530)
if attachment.instance_uuid == instance_uuid:
override = True
break
if not override:
msg = (_('Volume %(vol_id)s status must be %(statuses)s to '
'reserve, but the current status is %(current)s.') %
{'vol_id': vref.id,
'statuses': utils.build_or_str(expected['status']),
'current': vref.status})
raise exception.InvalidVolume(reason=msg)
values = {'volume_id': vref.id,
'volume_host': vref.host,
'attach_status': 'reserved',
'instance_uuid': instance_uuid}
db_ref = self.db.volume_attach(ctxt.elevated(), values)
return objects.VolumeAttachment.get_by_id(ctxt, db_ref['id'])
** Affects: nova
Importance: Undecided
Status: New
** Tags: migration multiattach
** Tags added: migration multiattach
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1968944
Title:
Concurrent migration of vms with the same multiattach volume fails
Status in OpenStack Compute (nova):
New
Bug description:
reproduce:
1. Create multiple vms
2. Create a multiattach volume
3. Attach the volume to all vms
4. Shut down all vms and migrate all vms at the same time
5. It is possible to find that a vm migration failed
The nova-compute log is as follows:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [req-95d6268a-95eb-4ea2-98e0-a9e973b8f19c cb6c975e503c4b1ca741f64a42d09d50 68dd5eeecb434da0aa5ebcdda19a8db6 - default default] [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Setting instance vm_state to ERROR: nova.exception.InvalidInput: Invalid input received: Invalid volume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] Traceback (most recent call last):
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 396, in wrapper
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 432, in wrapper
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] res = method(self, ctx, volume_id, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 807, in attachment_create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] instance_uuid=instance_id)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] self.force_reraise()
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise self.value
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/nova/volume/cinder.py", line 795, in attachment_create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] volume_id, _connector, instance_id)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/api_versions.py", line 423, in substitution
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return method.func(obj, *args, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/v3/attachments.py", line 39, in create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] retval = self._create('/attachments', body, 'attachment')
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/base.py", line 300, in _create
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] resp, body = self.api.client.post(url, body=body)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 217, in post
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self._cs_request(url, 'POST', **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 205, in _cs_request
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] return self.request(url, method, **kwargs)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] File "/usr/local/lib/python3.6/site-packages/cinderclient/client.py", line 191, in request
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] raise exceptions.from_response(resp, body)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] cinderclient.exceptions.BadRequest: Invalidvolume: Volume e269257b-831e-4be0-a1e6-fbb2aac922a6 status must be available or in-use or downloading to reserve, but the current status is attaching. (HTTP 400) (Request-ID: req-3515d919-aee2-40f4-887e-d5abb34a9d2e)
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d] During handling of the above exception, another exception occurred:
2022-04-11 16:49:46.685 23871 ERROR nova.compute.manager [instance: 17fc694e-284a-43f0-b6c6-c640a02db23d]
vm migration Process:
1. Call cinder's attachment_create function on the source node.Set the
multiattach volume status to reserved.
2. The vm performs the migration operation
3. Call cinder attachment_update function on the destination node, update the connection_info information in the attachment record, and set the status of the volume to attaching
4. Call cinder attachment_complete function on the destination node and set the volume status to in-use
The reason for the failure of vm migration is that the multiattach
volume status changes to attaching after the 3 step of the vm
migration process. At this time, when another vm migrates to the 1
step, it is judged that the volume status is attaching, which leads to
the execution of attachment_create fail.
def _attachment_reserve(self, ctxt, vref, instance_uuid=None):
# NOTE(jdg): Reserved is a special case, we're avoiding allowing
# creation of other new reserves/attachments while in this state
# so we avoid contention issues with shared connections
# Multiattach of bootable volumes is a special case with it's own
# policy, check that here right off the bat
if (vref.get('multiattach', False) and
vref.status == 'in-use' and
vref.bootable):
ctxt.authorize(
attachment_policy.MULTIATTACH_BOOTABLE_VOLUME_POLICY,
target_obj=vref)
# FIXME(JDG): We want to be able to do things here like reserve a
# volume for Nova to do BFV WHILE the volume may be in the process of
# downloading image, we add downloading here; that's easy enough but
# we've got a race between with the attaching/detaching that we do
# locally on the Cinder node. Just come up with an easy way to
# determine if we're attaching to the Cinder host for some work or if
# we're being used by the outside world.
expected = {'multiattach': vref.multiattach,
'status': (('available', 'in-use', 'downloading')
if vref.multiattach
else ('available', 'downloading'))}
result = vref.conditional_update({'status': 'reserved'},
expected)
if not result:
override = False
if instance_uuid and vref.status in ('in-use', 'reserved'):
# Refresh the volume reference in case multiple instances were
# being concurrently attached to the same non-multiattach
# volume.
vref = objects.Volume.get_by_id(ctxt, vref.id)
for attachment in vref.volume_attachment:
# If we're attaching the same volume to the same instance,
# we could be migrating the instance to another host in
# which case we want to allow the reservation.
# (LP BUG: 1694530)
if attachment.instance_uuid == instance_uuid:
override = True
break
if not override:
msg = (_('Volume %(vol_id)s status must be %(statuses)s to '
'reserve, but the current status is %(current)s.') %
{'vol_id': vref.id,
'statuses': utils.build_or_str(expected['status']),
'current': vref.status})
raise exception.InvalidVolume(reason=msg)
values = {'volume_id': vref.id,
'volume_host': vref.host,
'attach_status': 'reserved',
'instance_uuid': instance_uuid}
db_ref = self.db.volume_attach(ctxt.elevated(), values)
return objects.VolumeAttachment.get_by_id(ctxt, db_ref['id'])
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1968944/+subscriptions