yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95872
[Bug 2110697] [NEW] Failed cold migration can result in an instance where the root disk cinder volume is unattached
Public bug reported:
Release: 2024.1
Setup:
Server booted from volume with an ephemeral secondary drive
Issue:
Cold migration failed with:
```
2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinderclient/client.py", line 197, in request 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server raise exceptions.from_response(resp, body) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server cinderclient.exceptions.ClientException: Unable to update attachment.(Invalid volume: duplicate connectors detected on volume 678eebb1-b5e7-41cc-b327-132d04afa96a). (HTTP 500) (Request-ID: req-c3846bd1-8cc5-4278-8be3-83b93f7e8185) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server
```
This left the server in the error state. The state was then reset to
active in an attempt to recover the instance. I believe the server was
then stopped, then started to sync the state. At this point in time it
successfully started. At same later point in time, the server was
rebooted. It however failed to start with:
```
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 71, in wrapped
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server _emit_versioned_exception_notification(
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 63, in wrapped
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 186, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server LOG.warning("Failed to revert task state for instance. "
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 157, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/utils.py", line 1453, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 214, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server compute_utils.add_instance_fault_from_exc(context,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 203, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4265, in reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server do_reboot_instance(context, instance, block_device_info, reboot_type)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py", line 412, in inner
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4263, in do_reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self._reboot_instance(context, instance, block_device_info,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4360, in _reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self._set_instance_obj_error_state(instance)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4330, in _reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.driver.reboot(context, instance,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 3995, in reboot
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return self._hard_reboot(context, instance, network_info,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4096, in _hard_reboot
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server backing_disk_info = self._get_instance_disk_info_from_config(
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11712, in _get_instance_disk_info_from_config
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server qemu_img_info = disk_api.get_disk_info(path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/disk/api.py", line 97, in get_disk_info
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return images.qemu_img_info(path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/images.py", line 46, in qemu_img_info
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise exception.DiskNotFound(location=path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server nova.exception.DiskNotFound: No disk at /var/lib/nova/instances/c8635184-5c6a-4a07-8f7b-05d6dc248296/disk
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server
```
NOTE: it was trying to find a local disk since all volume attachments
had been removed.
In the logs we found:
```
[instance: c8635184-5c6a-4a07-8f7b-05d6dc248296] Removing stale volume attachment '786c83b8-86bb-4557-9b6a-a2a5d9ebdd68' from instance for volume '678eebb1-b5e7-41cc-b327-132d04afa96a'.
```
So it seemed like the hard reboot triggered the volume attachment to be
removed.
There didn't seem like an easy way to reattach the root volume, so we ended up recreating the server using the old volume and copying across the ephemeral data from the old hypervisor manually. Is there an easier way to recover from this state?
Can anything be done to stop nova cleaning up the volume attachments for
instances that have undergone a state reset?
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2110697
Title:
Failed cold migration can result in an instance where the root disk
cinder volume is unattached
Status in OpenStack Compute (nova):
New
Bug description:
Release: 2024.1
Setup:
Server booted from volume with an ephemeral secondary drive
Issue:
Cold migration failed with:
```
2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinderclient/client.py", line 197, in request 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server raise exceptions.from_response(resp, body) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server cinderclient.exceptions.ClientException: Unable to update attachment.(Invalid volume: duplicate connectors detected on volume 678eebb1-b5e7-41cc-b327-132d04afa96a). (HTTP 500) (Request-ID: req-c3846bd1-8cc5-4278-8be3-83b93f7e8185) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server
```
This left the server in the error state. The state was then reset to
active in an attempt to recover the instance. I believe the server was
then stopped, then started to sync the state. At this point in time it
successfully started. At same later point in time, the server was
rebooted. It however failed to start with:
```
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 71, in wrapped
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server _emit_versioned_exception_notification(
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 63, in wrapped
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 186, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server LOG.warning("Failed to revert task state for instance. "
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 157, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/utils.py", line 1453, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 214, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server compute_utils.add_instance_fault_from_exc(context,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 203, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4265, in reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server do_reboot_instance(context, instance, block_device_info, reboot_type)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py", line 412, in inner
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4263, in do_reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self._reboot_instance(context, instance, block_device_info,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4360, in _reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self._set_instance_obj_error_state(instance)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4330, in _reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server self.driver.reboot(context, instance,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 3995, in reboot
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return self._hard_reboot(context, instance, network_info,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4096, in _hard_reboot
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server backing_disk_info = self._get_instance_disk_info_from_config(
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11712, in _get_instance_disk_info_from_config
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server qemu_img_info = disk_api.get_disk_info(path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/disk/api.py", line 97, in get_disk_info
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server return images.qemu_img_info(path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/images.py", line 46, in qemu_img_info
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server raise exception.DiskNotFound(location=path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server nova.exception.DiskNotFound: No disk at /var/lib/nova/instances/c8635184-5c6a-4a07-8f7b-05d6dc248296/disk
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server
```
NOTE: it was trying to find a local disk since all volume attachments
had been removed.
In the logs we found:
```
[instance: c8635184-5c6a-4a07-8f7b-05d6dc248296] Removing stale volume attachment '786c83b8-86bb-4557-9b6a-a2a5d9ebdd68' from instance for volume '678eebb1-b5e7-41cc-b327-132d04afa96a'.
```
So it seemed like the hard reboot triggered the volume attachment to
be removed.
There didn't seem like an easy way to reattach the root volume, so we ended up recreating the server using the old volume and copying across the ephemeral data from the old hypervisor manually. Is there an easier way to recover from this state?
Can anything be done to stop nova cleaning up the volume attachments
for instances that have undergone a state reset?
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2110697/+subscriptions