yahoo-eng-team team mailing list archive

Thread
Date
[Bug 2110738] [NEW] Stable rescue fails when necessary image properties not set

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Artom Lifshitz <2110738@xxxxxxxxxxxxxxxxxx>
Date: Wed, 14 May 2025 16:30:45 -0000
Reply-to: Bug 2110738 <2110738@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

>From https://issues.redhat.com/browse/OSPRH-13142:

Description of problem:

For a boot-from-volume instances, 'openstack server rescue <vm> --image
<image>' fails with the following issues:

1.  It attempts to attach two disks: <instance_uuid>_disk &
<instance_uuid>_disk.rescue.  Only, <instance_uuid>_disk.rescue is
created so it fails with the following error:

2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server
nova.exception.InstanceNotRescuable: Instance
dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error:
internal error: process exited while connecting to monitor:
2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev
{"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[{"host":"172.16.1.100","port":"6789"}],"user":"openstack","auth-
client-required":["cephx","none"],"key-secret":"libvirt-1-storage-auth-
secret0","node-name":"libvirt-1-storage","cache":{"direct":false,"no-
flush":false},"auto-read-only":true,"discard":"unmap"}: error reading
header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or
directory

If you look in ceph, only the .rescue image exists.

# rbd --id openstack -p vms ls -l
NAME                                              SIZE    PARENT  FMT  PROT  LOCK
dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue  10 GiB            2        excl

However we see the instance configured with both disks.


# virsh domblklist instance-00000003
 Target   Source
----------------------------------------------------------------
 vda      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
 vdb      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk

If I manually copy, the UUID_disk.rescue to UUID_disk, the instance will
boot into RESCUE mode.  It seems the UUID_disk volume is not needed and
should not be configured in this RESCUE situation.

2.  The RESCUED instance doesn't attach the cinder root volume.  The
cinder root also doesnt re-attach after "unrescuing" the instance.

Reproducer:

$ openstack volume create --size 10 --image rhel8 rootvol1

$ openstack volume list
+--------------------------------------+----------+-----------+------+-------------+
| ID                                   | Name     | Status    | Size | Attached to |
+--------------------------------------+----------+-----------+------+-------------+
| f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | available |   10 |             |
+--------------------------------------+----------+-----------+------+-------------+


$ openstack server create --key-name default --flavor rhel --volume rootvol1 --network external test1

$ openstack server show test1 -c status -c image -c volumes_attached
+------------------+--------------------------------------------------------------------------+
| Field            | Value                                                                    |
+------------------+--------------------------------------------------------------------------+
| image            | N/A (booted from volume)                                                 |
| status           | ACTIVE                                                                   |
| volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
+------------------+--------------------------------------------------------------------------+


$ openstack server rescue test1 --image rhel8


$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| Field            | Value                                                                                                                                     |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| fault            | {'code': 400, 'created': '2024-01-23T20:12:17Z', 'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: Driver      |
|                  | Error: internal error: process exited while connecting to monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev                        |
|                  | {"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'}                                                                    |
| image            | N/A (booted from volume)                                                                                                                  |
| status           | ERROR                                                                                                                                     |
| volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'                                                                  |
+------------------+-------------------------------------------------------------------------------------------------------------------------------------------+


# virsh domblklist instance-00000004
 Target   Source
----------------------------------------------------------------
 vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
 vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk


# rbd --id openstack -p vms ls -l
NAME                                              SIZE    PARENT  FMT  PROT  LOCK
ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue  10 GiB            2     

NOTE: here if manually create the _disk volume, the instance will boot
into rescue mode; however, the cinder volume is not attached.

# rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
Image copy: 100% complete...done.

RESCUE now completes and the instance is accessible (without cinder root
vol attached).

$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
+------------------+--------------------------------------------------------------------------+
| Field            | Value                                                                    |
+------------------+--------------------------------------------------------------------------+
| image            | N/A (booted from volume)                                                 |
| status           | RESCUE                                                                   |
| volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
+------------------+--------------------------------------------------------------------------+

volume still shows in-use

$ openstack volume list
+--------------------------------------+----------+--------+------+--------------------------------+
| ID                                   | Name     | Status | Size | Attached to                    |
+--------------------------------------+----------+--------+------+--------------------------------+
| f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | in-use |   10 | Attached to test1 on /dev/vda  |
+--------------------------------------+----------+--------+------+--------------------------------+

But not attached.

# virsh domblklist instance-00000004
 Target   Source
----------------------------------------------------------------
 vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
 vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk


The other ugly thing, the unrescue does not revert this back to original disk config.

$ openstack server unrescue test1
$ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
+------------------+--------------------------------------------------------------------------+
| Field            | Value                                                                    |
+------------------+--------------------------------------------------------------------------+
| image            | N/A (booted from volume)                                                 |
| status           | ACTIVE                                                                   |
| volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
+------------------+--------------------------------------------------------------------------+

The above looks good, but the instance is still booted on rescue disks.

# virsh domblklist instance-00000004
 Target   Source
----------------------------------------------------------------
 vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
 vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

A hard reboot will fix it:

$ openstack server reboot --hard test1

Now the instance is back to boot from vol:

# virsh domblklist instance-00000004
 Target   Source
---------------------------------------------------------------
 vda      volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596


Version-Release number of selected component (if applicable):
Wallaby

How reproducible:
100%

Steps to Reproduce:
1. See above
2.
3.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2110738

Title:
  Stable rescue fails when necessary image properties not set

Status in OpenStack Compute (nova):
  New

Bug description:
  From https://issues.redhat.com/browse/OSPRH-13142:

  Description of problem:

  For a boot-from-volume instances, 'openstack server rescue <vm>
  --image <image>' fails with the following issues:

  1.  It attempts to attach two disks: <instance_uuid>_disk &
  <instance_uuid>_disk.rescue.  Only, <instance_uuid>_disk.rescue is
  created so it fails with the following error:

  2024-01-23 16:32:14.338 2 ERROR oslo_messaging.rpc.server
  nova.exception.InstanceNotRescuable: Instance
  dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2 cannot be rescued: Driver Error:
  internal error: process exited while connecting to monitor:
  2024-01-23T16:32:13.017966Z qemu-kvm: -blockdev
  {"driver":"rbd","pool":"vms","image":"dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk","server":[{"host":"172.16.1.100","port":"6789"}],"user":"openstack","auth-
  client-required":["cephx","none"],"key-secret":"libvirt-1-storage-
  auth-secret0","node-
  name":"libvirt-1-storage","cache":{"direct":false,"no-
  flush":false},"auto-read-only":true,"discard":"unmap"}: error reading
  header from dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk: No such file or
  directory

  If you look in ceph, only the .rescue image exists.

  # rbd --id openstack -p vms ls -l
  NAME                                              SIZE    PARENT  FMT  PROT  LOCK
  dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue  10 GiB            2        excl

  However we see the instance configured with both disks.

  
  # virsh domblklist instance-00000003
   Target   Source
  ----------------------------------------------------------------
   vda      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk.rescue
   vdb      vms/dc0812ba-b4ca-4ffa-a7e5-2157e52f35d2_disk

  If I manually copy, the UUID_disk.rescue to UUID_disk, the instance
  will boot into RESCUE mode.  It seems the UUID_disk volume is not
  needed and should not be configured in this RESCUE situation.

  2.  The RESCUED instance doesn't attach the cinder root volume.  The
  cinder root also doesnt re-attach after "unrescuing" the instance.

  Reproducer:

  $ openstack volume create --size 10 --image rhel8 rootvol1

  $ openstack volume list
  +--------------------------------------+----------+-----------+------+-------------+
  | ID                                   | Name     | Status    | Size | Attached to |
  +--------------------------------------+----------+-----------+------+-------------+
  | f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | available |   10 |             |
  +--------------------------------------+----------+-----------+------+-------------+

  
  $ openstack server create --key-name default --flavor rhel --volume rootvol1 --network external test1

  $ openstack server show test1 -c status -c image -c volumes_attached
  +------------------+--------------------------------------------------------------------------+
  | Field            | Value                                                                    |
  +------------------+--------------------------------------------------------------------------+
  | image            | N/A (booted from volume)                                                 |
  | status           | ACTIVE                                                                   |
  | volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
  +------------------+--------------------------------------------------------------------------+

  
  $ openstack server rescue test1 --image rhel8

  
  $ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
  +------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
  | Field            | Value                                                                                                                                     |
  +------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
  | fault            | {'code': 400, 'created': '2024-01-23T20:12:17Z', 'message': 'Instance ac3d46c0-c8d5-45df-bd17-d467baaa5a98 cannot be rescued: Driver      |
  |                  | Error: internal error: process exited while connecting to monitor: 2024-01-23T20:12:17.612453Z qemu-kvm: -blockdev                        |
  |                  | {"driver":"rbd","pool":"vms","image":"ac3d46c0-c8d5-45df-bd17-d467ba'}                                                                    |
  | image            | N/A (booted from volume)                                                                                                                  |
  | status           | ERROR                                                                                                                                     |
  | volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596'                                                                  |
  +------------------+-------------------------------------------------------------------------------------------------------------------------------------------+

  
  # virsh domblklist instance-00000004
   Target   Source
  ----------------------------------------------------------------
   vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
   vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

  
  # rbd --id openstack -p vms ls -l
  NAME                                              SIZE    PARENT  FMT  PROT  LOCK
  ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue  10 GiB            2     

  NOTE: here if manually create the _disk volume, the instance will boot
  into rescue mode; however, the cinder volume is not attached.

  # rbd --id openstack cp vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk
  Image copy: 100% complete...done.

  RESCUE now completes and the instance is accessible (without cinder
  root vol attached).

  $ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
  +------------------+--------------------------------------------------------------------------+
  | Field            | Value                                                                    |
  +------------------+--------------------------------------------------------------------------+
  | image            | N/A (booted from volume)                                                 |
  | status           | RESCUE                                                                   |
  | volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
  +------------------+--------------------------------------------------------------------------+

  volume still shows in-use

  $ openstack volume list
  +--------------------------------------+----------+--------+------+--------------------------------+
  | ID                                   | Name     | Status | Size | Attached to                    |
  +--------------------------------------+----------+--------+------+--------------------------------+
  | f855dfe6-ad5a-4497-87ff-16ac5856f596 | rootvol1 | in-use |   10 | Attached to test1 on /dev/vda  |
  +--------------------------------------+----------+--------+------+--------------------------------+

  But not attached.

  # virsh domblklist instance-00000004
   Target   Source
  ----------------------------------------------------------------
   vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
   vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

  
  The other ugly thing, the unrescue does not revert this back to original disk config.

  $ openstack server unrescue test1
  $ openstack server show test1 -c status -c image -c volumes_attached -c fault --fit
  +------------------+--------------------------------------------------------------------------+
  | Field            | Value                                                                    |
  +------------------+--------------------------------------------------------------------------+
  | image            | N/A (booted from volume)                                                 |
  | status           | ACTIVE                                                                   |
  | volumes_attached | delete_on_termination='False', id='f855dfe6-ad5a-4497-87ff-16ac5856f596' |
  +------------------+--------------------------------------------------------------------------+

  The above looks good, but the instance is still booted on rescue
  disks.

  # virsh domblklist instance-00000004
   Target   Source
  ----------------------------------------------------------------
   vda      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk.rescue
   vdb      vms/ac3d46c0-c8d5-45df-bd17-d467baaa5a98_disk

  A hard reboot will fix it:

  $ openstack server reboot --hard test1

  Now the instance is back to boot from vol:

  # virsh domblklist instance-00000004
   Target   Source
  ---------------------------------------------------------------
   vda      volumes/volume-f855dfe6-ad5a-4497-87ff-16ac5856f596

  
  Version-Release number of selected component (if applicable):
  Wallaby

  How reproducible:
  100%

  Steps to Reproduce:
  1. See above
  2.
  3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2110738/+subscriptions