← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2110697] [NEW] Failed cold migration can result in an instance where the root disk cinder volume is unattached

 

Public bug reported:

Release: 2024.1

Setup:

Server booted from volume with an ephemeral secondary drive

Issue:

Cold migration failed with:

```
2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinderclient/client.py", line 197, in request 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server raise exceptions.from_response(resp, body) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server cinderclient.exceptions.ClientException: Unable to update attachment.(Invalid volume: duplicate connectors detected on volume 678eebb1-b5e7-41cc-b327-132d04afa96a). (HTTP 500) (Request-ID: req-c3846bd1-8cc5-4278-8be3-83b93f7e8185) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server

```

This left the server in the error state. The state was then reset to
active in an attempt to recover the instance. I believe the server was
then stopped, then started to sync the state. At this point in time it
successfully started. At same later point in time, the server was
rebooted. It however failed to start with:

```
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 71, in wrapped
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     _emit_versioned_exception_notification(
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 63, in wrapped
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 186, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     LOG.warning("Failed to revert task state for instance. "
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 157, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/utils.py", line 1453, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 214, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     compute_utils.add_instance_fault_from_exc(context,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 203, in decorated_function
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4265, in reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     do_reboot_instance(context, instance, block_device_info, reboot_type)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py", line 412, in inner
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4263, in do_reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self._reboot_instance(context, instance, block_device_info,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4360, in _reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self._set_instance_obj_error_state(instance)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4330, in _reboot_instance
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.driver.reboot(context, instance,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 3995, in reboot
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return self._hard_reboot(context, instance, network_info,
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4096, in _hard_reboot
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     backing_disk_info = self._get_instance_disk_info_from_config(
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11712, in _get_instance_disk_info_from_config
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     qemu_img_info = disk_api.get_disk_info(path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/disk/api.py", line 97, in get_disk_info
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return images.qemu_img_info(path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/images.py", line 46, in qemu_img_info
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise exception.DiskNotFound(location=path)
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server nova.exception.DiskNotFound: No disk at /var/lib/nova/instances/c8635184-5c6a-4a07-8f7b-05d6dc248296/disk
2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server

```

NOTE: it was trying to find a local disk since all volume attachments
had been removed.

In the logs we found:

```
[instance: c8635184-5c6a-4a07-8f7b-05d6dc248296] Removing stale volume attachment '786c83b8-86bb-4557-9b6a-a2a5d9ebdd68' from instance for volume '678eebb1-b5e7-41cc-b327-132d04afa96a'.

```

So it seemed like the hard reboot triggered the volume attachment to be
removed.


There didn't seem like an easy way to reattach the root volume, so we ended up recreating the server using the old volume and copying across the ephemeral data from the old hypervisor manually. Is there an easier way to recover from this state?

Can anything be done to stop nova cleaning up the volume attachments for
instances that have undergone a state reset?

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2110697

Title:
  Failed cold migration can result in an instance where the root disk
  cinder volume is unattached

Status in OpenStack Compute (nova):
  New

Bug description:
  Release: 2024.1

  Setup:

  Server booted from volume with an ephemeral secondary drive

  Issue:

  Cold migration failed with:

  ```
  2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinderclient/client.py", line 197, in request 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server raise exceptions.from_response(resp, body) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server cinderclient.exceptions.ClientException: Unable to update attachment.(Invalid volume: duplicate connectors detected on volume 678eebb1-b5e7-41cc-b327-132d04afa96a). (HTTP 500) (Request-ID: req-c3846bd1-8cc5-4278-8be3-83b93f7e8185) 2024-12-04 17:03:07.499 7 ERROR oslo_messaging.rpc.server

  ```

  This left the server in the error state. The state was then reset to
  active in an attempt to recover the instance. I believe the server was
  then stopped, then started to sync the state. At this point in time it
  successfully started. At same later point in time, the server was
  rebooted. It however failed to start with:

  ```
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 71, in wrapped
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     _emit_versioned_exception_notification(
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/exception_wrapper.py", line 63, in wrapped
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return f(self, context, *args, **kw)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 186, in decorated_function
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     LOG.warning("Failed to revert task state for instance. "
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 157, in decorated_function
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/utils.py", line 1453, in decorated_function
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 214, in decorated_function
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     compute_utils.add_instance_fault_from_exc(context,
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 203, in decorated_function
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return function(self, context, *args, **kwargs)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4265, in reboot_instance
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     do_reboot_instance(context, instance, block_device_info, reboot_type)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_concurrency/lockutils.py", line 412, in inner
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4263, in do_reboot_instance
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self._reboot_instance(context, instance, block_device_info,
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4360, in _reboot_instance
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self._set_instance_obj_error_state(instance)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 227, in __exit__
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise self.value
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/compute/manager.py", line 4330, in _reboot_instance
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     self.driver.reboot(context, instance,
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 3995, in reboot
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return self._hard_reboot(context, instance, network_info,
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4096, in _hard_reboot
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     backing_disk_info = self._get_instance_disk_info_from_config(
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/libvirt/driver.py", line 11712, in _get_instance_disk_info_from_config
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     qemu_img_info = disk_api.get_disk_info(path)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/disk/api.py", line 97, in get_disk_info
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     return images.qemu_img_info(path)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server   File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/virt/images.py", line 46, in qemu_img_info
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server     raise exception.DiskNotFound(location=path)
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server nova.exception.DiskNotFound: No disk at /var/lib/nova/instances/c8635184-5c6a-4a07-8f7b-05d6dc248296/disk
  2025-04-03 10:58:51.433 7 ERROR oslo_messaging.rpc.server

  ```

  NOTE: it was trying to find a local disk since all volume attachments
  had been removed.

  In the logs we found:

  ```
  [instance: c8635184-5c6a-4a07-8f7b-05d6dc248296] Removing stale volume attachment '786c83b8-86bb-4557-9b6a-a2a5d9ebdd68' from instance for volume '678eebb1-b5e7-41cc-b327-132d04afa96a'.

  ```

  So it seemed like the hard reboot triggered the volume attachment to
  be removed.

  
  There didn't seem like an easy way to reattach the root volume, so we ended up recreating the server using the old volume and copying across the ephemeral data from the old hypervisor manually. Is there an easier way to recover from this state?

  Can anything be done to stop nova cleaning up the volume attachments
  for instances that have undergone a state reset?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2110697/+subscriptions