← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2112187] Re: Live migration fails after migrating the volume the instance has attached

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/957757
Committed: https://opendev.org/openstack/nova/commit/93c0f9bc749cea39f9cd1bd9d3e5a5585f1f6cac
Submitter: "Zuul (22348)"
Branch:    master

commit 93c0f9bc749cea39f9cd1bd9d3e5a5585f1f6cac
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date:   Fri Jun 13 18:15:46 2025 +0100

    restrict swap volume to cinder
    
    This change tightens the validation around the attachment
    update API to ensure that it can only be called if the source
    volume has a non empty migration status.
    
    That means it will only accept a request to swap the volume if
    it is the result of a cinder volume migration.
    
    This change is being made to prevent the instance domain
    XML from getting out of sync with the nova BDM records
    and cinder connection info. In the future support for direct
    swap volume actions can be re-added if and only if the
    nova libvirt driver is updated to correctly modify the domain.
    The libvirt driver is the only driver that supported this API
    outside of a cinder orchestrated swap volume.
    
    By allowing the domain XML and BDMs to get out of sync
    if an admin later live-migrates the VM the host path will not be
    modified for the destination host. Normally this results in a live
    migration failure which often prompts the admin to cold migrate instead.
    however if the source device path exists on the destination the migration
    will proceed. This can lead to 2 VMs using the same host block device.
    At best this will cause a crash or data corruption.
    At worst it will allow one guest to access the data of another.
    
    Prior to this change there was an explicit warning in nova API ref
    stating that humans should never call this API because it can lead
    to this situation. Now it considered a hard error due to the
    security implications.
    
    Closes-Bug: #2112187
    Depends-on: https://review.opendev.org/c/openstack/tempest/+/957753
    Change-Id: I439338bd2f27ccd65a436d18c8cbc9c3127ee612
    Signed-off-by: Sean Mooney <work@xxxxxxxxxxxxxxx>


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2112187

Title:
  Live migration fails after migrating the volume the instance has
  attached

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Security Advisory:
  Won't Fix
Status in OpenStack Security Notes:
  In Progress
Status in tempest:
  In Progress
Status in watcher:
  Fix Released

Bug description:
  Using Watcher, live migrating an instance that has a volume attached
  fails after migrating the volume. The migration fails with
  libvirt.libvirtError: Cannot access storage file '/dev/sdd': No such
  file or directory

  Steps to reproduce the nova live migrate issue after volume migration:
  1. Create an audit template:

   openstack optimize audittemplate create zm_test hardware_maintenance
  --strategy zone_migration

  2. Create vms with volumes:

  set -ex

  # Create Image
  IMG=cirros-0.5.2-x86_64-disk.img
  URL=http://download.cirros-cloud.net/0.5.2/$IMG
  DISK_FORMAT=qcow2
  RAW=$IMG
  NUMBER_OF_INSTANCES=${1:-8}

  # Create flavor
  openstack flavor show m1.tiny || \
      openstack flavor create --ram 512 --vcpus 1 --disk 1 --ephemeral 1 m1.tiny

  curl -L -# $URL > /tmp/$IMG
  if type qemu-img >/dev/null 2>&1; then
      RAW=$(echo $IMG | sed s/img/raw/g)
      qemu-img convert -f qcow2 -O raw /tmp/$IMG /tmp/$RAW
      DISK_FORMAT=raw
  fi

  openstack image show cirros || \
      openstack image create --container-format bare --disk-format $DISK_FORMAT cirros < /tmp/$RAW

  # Create security group and icmp/ssh rules
  openstack security group show basic || {
      openstack security group create basic
      openstack security group rule create basic --protocol icmp --ingress --icmp-type -1
      openstack security group rule create basic --protocol tcp --ingress --dst-port 22
  }

  # Create an instance
  for (( i=1; i<${NUMBER_OF_INSTANCES}; i++ )); do
      NAME=test_${i}
      VOL_NAME=tes_vol_${i}

      # create a volume
      openstack volume create --size ${i} --image cirros ${VOL_NAME}
      openstack server show ${NAME} || {
          openstack server create --flavor m1.tiny --image cirros --nic net-id=private ${NAME} --security-group basic --wait
          fip=$(openstack floating ip create public -f value -c floating_ip_address)
          openstack server add floating ip ${NAME} $fip
      }
      openstack server add volume ${NAME} ${VOL_NAME}
      openstack server list --long

  done

  3. Create the audit:

  openstack optimize audit create -a zm_test -p
  storage_pools='[{"src_pool": "jgilaber-
  watcher-3@lvmdriver-1#lvmdriver-1", "dst_pool": "jgilaber-
  watcher-2@lvmdriver-1#lvmdriver-1", "src_type": "lvmdriver-1",
  "dst_type": "lvmdriver-1"}]' -p compute_nodes='[{"src_node":
  "jgilaber-watcher-3", "dst_node": "jgilaber-watcher-1"}]' -p
  with_attached_volume=true

  Note the audit id that is generated

  4. Check the action plan generated

  openstack optimize actionplan list --audit <audit_id>

  Note the action plan uuid

  5. Start the proposed action plan

  openstack  optimize actionplan start <action_plan_id>

  As a result, the audit triggers the migration of volumes and instances 5 and 2, test_2 was migrated succesfully, but test_5 failed.
  List of events:

  (venv) ubuntu@jgilaber-watcher-1:/opt/stack/watcher$ openstack server event list test_5
  +------------------------------------------+--------------------------------------+----------------+----------------------------+
  | Request ID                               | Server ID                            | Action         | Start Time                 |
  +------------------------------------------+--------------------------------------+----------------+----------------------------+
  | req-c805b970-97cd-424d-8415-7c30f4101399 | db250a8a-f788-48d4-bd56-76fd6890c11b | live-migration | 2025-05-30T08:42:08.000000 |
  | req-6850d126-7e0c-4d80-85d4-16216476a842 | db250a8a-f788-48d4-bd56-76fd6890c11b | swap_volume    | 2025-05-30T08:41:16.000000 |
  | req-24a8e19a-a43c-4249-a700-2efa0edaf92f | db250a8a-f788-48d4-bd56-76fd6890c11b | attach_volume  | 2025-05-30T08:35:51.000000 |
  | req-05a16a88-4266-45b6-8c35-bc2c0d83a434 | db250a8a-f788-48d4-bd56-76fd6890c11b | create         | 2025-05-30T08:35:29.000000 |
  +------------------------------------------+--------------------------------------+----------------+----------------------------+
  ```
  ```
  (venv) ubuntu@jgilaber-watcher-1:/opt/stack/watcher$ openstack server event show test_5 req-6850d126-7e0c-4d80-85d4-16216476a842
  +------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field      | Value                                                                                                                                                                         |
  +------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | action     | swap_volume                                                                                                                                                                   |
  | events     | details=, event='compute_swap_volume', finish_time='2025-05-30T08:41:41.000000', host='jgilaber-watcher-3',                                                                   |
  |            | host_id='381be10e2d2b6140b67be690bf8877fa69223584c6a2c4a513155085', result='Success', start_time='2025-05-30T08:41:16.000000', traceback=                                     |
  | id         | req-6850d126-7e0c-4d80-85d4-16216476a842                                                                                                                                      |
  | message    | None                                                                                                                                                                          |
  | project_id | 14f6d4c19ca04da0b68cd09b986428a8                                                                                                                                              |
  | request_id | req-6850d126-7e0c-4d80-85d4-16216476a842                                                                                                                                      |
  | start_time | 2025-05-30T08:41:16.000000                                                                                                                                                    |
  | user_id    | af0c6c70cc2a46e1abe39d3ad31de18c                                                                                                                                              |
  +------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  (venv) ubuntu@jgilaber-watcher-1:/opt/stack/watcher$ openstack server event show test_5 req-c805b970-97cd-424d-8415-7c30f4101399
  +------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field      | Value                                                                                                                                                                         |
  +------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | action     | live-migration                                                                                                                                                                |
  | events     | details=, event='compute_rollback_live_migration_at_destination', finish_time='2025-05-30T08:42:41.000000', host='jgilaber-watcher-1',                                        |
  |            | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:38.000000', traceback=                                     |
  |            | details=, event='compute_pre_live_migration', finish_time='2025-05-30T08:42:29.000000', host='jgilaber-watcher-1',                                                            |
  |            | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:17.000000', traceback=                                     |
  |            | details=, event='compute_live_migration', finish_time='2025-05-30T08:42:15.000000', host='jgilaber-watcher-3',                                                                |
  |            | host_id='381be10e2d2b6140b67be690bf8877fa69223584c6a2c4a513155085', result='Success', start_time='2025-05-30T08:42:14.000000', traceback=                                     |
  |            | details=, event='compute_check_can_live_migrate_source', finish_time='2025-05-30T08:42:12.000000', host='jgilaber-watcher-3',                                                 |
  |            | host_id='381be10e2d2b6140b67be690bf8877fa69223584c6a2c4a513155085', result='Success', start_time='2025-05-30T08:42:11.000000', traceback=                                     |
  |            | details=, event='compute_check_can_live_migrate_destination', finish_time='2025-05-30T08:42:13.000000', host='jgilaber-watcher-1',                                            |
  |            | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:09.000000', traceback=                                     |
  |            | details=, event='conductor_live_migrate_instance', finish_time='2025-05-30T08:42:14.000000', host='jgilaber-watcher-1',                                                       |
  |            | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:08.000000', traceback=                                     |
  | id         | req-c805b970-97cd-424d-8415-7c30f4101399                                                                                                                                      |
  | message    | None                                                                                                                                                                          |
  | project_id | 14f6d4c19ca04da0b68cd09b986428a8                                                                                                                                              |
  | request_id | req-c805b970-97cd-424d-8415-7c30f4101399                                                                                                                                      |
  | start_time | 2025-05-30T08:42:08.000000                                                                                                                                                    |
  | user_id    | af0c6c70cc2a46e1abe39d3ad31de18c                                                                                                                                              |
  +------------+-----------------------------------------------------------------------------------------------------------------------------

  Compute-nova logs from the source host with the error and guest xml
  https://paste.openstack.org/show/b9u14yO4KYJS31babgqt/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2112187/+subscriptions



References