yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #96317
[Bug 2112187] Re: Live migration fails after migrating the volume the instance has attached
Reviewed: https://review.opendev.org/c/openstack/nova/+/957757
Committed: https://opendev.org/openstack/nova/commit/93c0f9bc749cea39f9cd1bd9d3e5a5585f1f6cac
Submitter: "Zuul (22348)"
Branch: master
commit 93c0f9bc749cea39f9cd1bd9d3e5a5585f1f6cac
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date: Fri Jun 13 18:15:46 2025 +0100
restrict swap volume to cinder
This change tightens the validation around the attachment
update API to ensure that it can only be called if the source
volume has a non empty migration status.
That means it will only accept a request to swap the volume if
it is the result of a cinder volume migration.
This change is being made to prevent the instance domain
XML from getting out of sync with the nova BDM records
and cinder connection info. In the future support for direct
swap volume actions can be re-added if and only if the
nova libvirt driver is updated to correctly modify the domain.
The libvirt driver is the only driver that supported this API
outside of a cinder orchestrated swap volume.
By allowing the domain XML and BDMs to get out of sync
if an admin later live-migrates the VM the host path will not be
modified for the destination host. Normally this results in a live
migration failure which often prompts the admin to cold migrate instead.
however if the source device path exists on the destination the migration
will proceed. This can lead to 2 VMs using the same host block device.
At best this will cause a crash or data corruption.
At worst it will allow one guest to access the data of another.
Prior to this change there was an explicit warning in nova API ref
stating that humans should never call this API because it can lead
to this situation. Now it considered a hard error due to the
security implications.
Closes-Bug: #2112187
Depends-on: https://review.opendev.org/c/openstack/tempest/+/957753
Change-Id: I439338bd2f27ccd65a436d18c8cbc9c3127ee612
Signed-off-by: Sean Mooney <work@xxxxxxxxxxxxxxx>
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2112187
Title:
Live migration fails after migrating the volume the instance has
attached
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Security Advisory:
Won't Fix
Status in OpenStack Security Notes:
In Progress
Status in tempest:
In Progress
Status in watcher:
Fix Released
Bug description:
Using Watcher, live migrating an instance that has a volume attached
fails after migrating the volume. The migration fails with
libvirt.libvirtError: Cannot access storage file '/dev/sdd': No such
file or directory
Steps to reproduce the nova live migrate issue after volume migration:
1. Create an audit template:
openstack optimize audittemplate create zm_test hardware_maintenance
--strategy zone_migration
2. Create vms with volumes:
set -ex
# Create Image
IMG=cirros-0.5.2-x86_64-disk.img
URL=http://download.cirros-cloud.net/0.5.2/$IMG
DISK_FORMAT=qcow2
RAW=$IMG
NUMBER_OF_INSTANCES=${1:-8}
# Create flavor
openstack flavor show m1.tiny || \
openstack flavor create --ram 512 --vcpus 1 --disk 1 --ephemeral 1 m1.tiny
curl -L -# $URL > /tmp/$IMG
if type qemu-img >/dev/null 2>&1; then
RAW=$(echo $IMG | sed s/img/raw/g)
qemu-img convert -f qcow2 -O raw /tmp/$IMG /tmp/$RAW
DISK_FORMAT=raw
fi
openstack image show cirros || \
openstack image create --container-format bare --disk-format $DISK_FORMAT cirros < /tmp/$RAW
# Create security group and icmp/ssh rules
openstack security group show basic || {
openstack security group create basic
openstack security group rule create basic --protocol icmp --ingress --icmp-type -1
openstack security group rule create basic --protocol tcp --ingress --dst-port 22
}
# Create an instance
for (( i=1; i<${NUMBER_OF_INSTANCES}; i++ )); do
NAME=test_${i}
VOL_NAME=tes_vol_${i}
# create a volume
openstack volume create --size ${i} --image cirros ${VOL_NAME}
openstack server show ${NAME} || {
openstack server create --flavor m1.tiny --image cirros --nic net-id=private ${NAME} --security-group basic --wait
fip=$(openstack floating ip create public -f value -c floating_ip_address)
openstack server add floating ip ${NAME} $fip
}
openstack server add volume ${NAME} ${VOL_NAME}
openstack server list --long
done
3. Create the audit:
openstack optimize audit create -a zm_test -p
storage_pools='[{"src_pool": "jgilaber-
watcher-3@lvmdriver-1#lvmdriver-1", "dst_pool": "jgilaber-
watcher-2@lvmdriver-1#lvmdriver-1", "src_type": "lvmdriver-1",
"dst_type": "lvmdriver-1"}]' -p compute_nodes='[{"src_node":
"jgilaber-watcher-3", "dst_node": "jgilaber-watcher-1"}]' -p
with_attached_volume=true
Note the audit id that is generated
4. Check the action plan generated
openstack optimize actionplan list --audit <audit_id>
Note the action plan uuid
5. Start the proposed action plan
openstack optimize actionplan start <action_plan_id>
As a result, the audit triggers the migration of volumes and instances 5 and 2, test_2 was migrated succesfully, but test_5 failed.
List of events:
(venv) ubuntu@jgilaber-watcher-1:/opt/stack/watcher$ openstack server event list test_5
+------------------------------------------+--------------------------------------+----------------+----------------------------+
| Request ID | Server ID | Action | Start Time |
+------------------------------------------+--------------------------------------+----------------+----------------------------+
| req-c805b970-97cd-424d-8415-7c30f4101399 | db250a8a-f788-48d4-bd56-76fd6890c11b | live-migration | 2025-05-30T08:42:08.000000 |
| req-6850d126-7e0c-4d80-85d4-16216476a842 | db250a8a-f788-48d4-bd56-76fd6890c11b | swap_volume | 2025-05-30T08:41:16.000000 |
| req-24a8e19a-a43c-4249-a700-2efa0edaf92f | db250a8a-f788-48d4-bd56-76fd6890c11b | attach_volume | 2025-05-30T08:35:51.000000 |
| req-05a16a88-4266-45b6-8c35-bc2c0d83a434 | db250a8a-f788-48d4-bd56-76fd6890c11b | create | 2025-05-30T08:35:29.000000 |
+------------------------------------------+--------------------------------------+----------------+----------------------------+
```
```
(venv) ubuntu@jgilaber-watcher-1:/opt/stack/watcher$ openstack server event show test_5 req-6850d126-7e0c-4d80-85d4-16216476a842
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | swap_volume |
| events | details=, event='compute_swap_volume', finish_time='2025-05-30T08:41:41.000000', host='jgilaber-watcher-3', |
| | host_id='381be10e2d2b6140b67be690bf8877fa69223584c6a2c4a513155085', result='Success', start_time='2025-05-30T08:41:16.000000', traceback= |
| id | req-6850d126-7e0c-4d80-85d4-16216476a842 |
| message | None |
| project_id | 14f6d4c19ca04da0b68cd09b986428a8 |
| request_id | req-6850d126-7e0c-4d80-85d4-16216476a842 |
| start_time | 2025-05-30T08:41:16.000000 |
| user_id | af0c6c70cc2a46e1abe39d3ad31de18c |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(venv) ubuntu@jgilaber-watcher-1:/opt/stack/watcher$ openstack server event show test_5 req-c805b970-97cd-424d-8415-7c30f4101399
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| action | live-migration |
| events | details=, event='compute_rollback_live_migration_at_destination', finish_time='2025-05-30T08:42:41.000000', host='jgilaber-watcher-1', |
| | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:38.000000', traceback= |
| | details=, event='compute_pre_live_migration', finish_time='2025-05-30T08:42:29.000000', host='jgilaber-watcher-1', |
| | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:17.000000', traceback= |
| | details=, event='compute_live_migration', finish_time='2025-05-30T08:42:15.000000', host='jgilaber-watcher-3', |
| | host_id='381be10e2d2b6140b67be690bf8877fa69223584c6a2c4a513155085', result='Success', start_time='2025-05-30T08:42:14.000000', traceback= |
| | details=, event='compute_check_can_live_migrate_source', finish_time='2025-05-30T08:42:12.000000', host='jgilaber-watcher-3', |
| | host_id='381be10e2d2b6140b67be690bf8877fa69223584c6a2c4a513155085', result='Success', start_time='2025-05-30T08:42:11.000000', traceback= |
| | details=, event='compute_check_can_live_migrate_destination', finish_time='2025-05-30T08:42:13.000000', host='jgilaber-watcher-1', |
| | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:09.000000', traceback= |
| | details=, event='conductor_live_migrate_instance', finish_time='2025-05-30T08:42:14.000000', host='jgilaber-watcher-1', |
| | host_id='636609f8b2efe009a9bf62531dbfd40694ee83a1bb9d81403df7c404', result='Success', start_time='2025-05-30T08:42:08.000000', traceback= |
| id | req-c805b970-97cd-424d-8415-7c30f4101399 |
| message | None |
| project_id | 14f6d4c19ca04da0b68cd09b986428a8 |
| request_id | req-c805b970-97cd-424d-8415-7c30f4101399 |
| start_time | 2025-05-30T08:42:08.000000 |
| user_id | af0c6c70cc2a46e1abe39d3ad31de18c |
+------------+-----------------------------------------------------------------------------------------------------------------------------
Compute-nova logs from the source host with the error and guest xml
https://paste.openstack.org/show/b9u14yO4KYJS31babgqt/
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2112187/+subscriptions
References