yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #72834
[Bug 1769131] Re: After cold-migration of a volume-backed instance, disk.info file leftover on source host
Reviewed: https://review.openstack.org/566367
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e3385707cb1ced55cd12b1314d8c0b68d354c38
Submitter: Zuul
Branch: master
commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Fri May 4 12:58:07 2018 -0400
libvirt: check image type before removing snapshots in _cleanup_resize
Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk
cleanup logic to _cleanup_resize because some image backends (Qcow2,
Flat and Ploop) will re-create the instance directory and disk.info
file when initializing the image backend object.
However, that change did not take into account volume-backed instances
being resized will not have a root disk *and* if the local disk is
shared storage, removing the instance directory effectively deletes
the instance files, like the console.log, on the destination host
as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made
to resolve that issue.
However (see the pattern?), if you're doing a resize of a
volume-backed instance that is not on shared storage, we won't remove
the instance directory from the source host in _cleanup_resize. If the
admin then later tries to live migrate the instance back to that host,
it will fail with DestinationDiskExists in the pre_live_migration()
method.
This change is essentially a revert of
I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for
Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem
is that creating certain imagebackend objects will recreate the
instance directory and disk.info on the source host, we simply need
to avoid creating the imagebackend object. The only reason we are
getting an imagebackend object in _cleanup_resize is to remove
image snapshot clones, which is only implemented by the Rbd image
backend. Therefore, we can check to see if the image type supports
clones and if not, don't go through the imagebackend init routine
that, for some, will recreate the disk.
Change-Id: Ib10081150e125961cba19cfa821bddfac4614408
Closes-Bug: #1769131
Related-Bug: #1666831
Related-Bug: #1728603
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1769131
Title:
After cold-migration of a volume-backed instance, disk.info file
leftover on source host
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) ocata series:
In Progress
Status in OpenStack Compute (nova) pike series:
In Progress
Status in OpenStack Compute (nova) queens series:
In Progress
Bug description:
Tested using kolla-ansible, with kolla images stable/queens.
In this setup there are only two compute nodes, with cinder/lvm for
storage.
A cirros instance is created, on compute02, and cold-migrated to
compute01.
At the step where it's awaiting confirmation, the following files can
be found:
compute01
/var/lib/docker/volumes/nova_compute/_data/instances
\-- 371e669b-0f15-49f2-9a84-bd1e89f34294
\-- console.log
compute02
1 directory, 1 file
/var/lib/docker/volumes/nova_compute/_data/instances
\-- 371e669b-0f15-49f2-9a84-bd1e89f34294_resize
\-- console.log
1 directory, 1 file
After confirming the migrate/resize, this becomes:
compute01
/var/lib/docker/volumes/nova_compute/_data/instances
\-- 371e669b-0f15-49f2-9a84-bd1e89f34294
\-- console.log
compute02
1 directory, 1 file
/var/lib/docker/volumes/nova_compute/_data/instances
\-- 371e669b-0f15-49f2-9a84-bd1e89f34294
\-- disk.info
1 directory, 1 file
This log shows how after the _resize information is cleaned up, that
*after this, this file ends up on the source host, where it is left.
http://paste.openstack.org/show/720358/
2018-05-04 12:55:10.818 7 DEBUG nova.compute.manager [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] Going to confirm migration 4 do_confirm_resize /usr/lib/python2.7/site-packages/nova/compute/manager.py:3684
2018-05-04 12:55:11.032 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Acquired semaphore "refresh_cache-371e669b-0f15-49f2-9a84-bd1e89f34294" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2018-05-04 12:55:11.033 7 DEBUG nova.network.neutronv2.api [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] _get_instance_nw_info() _get_instance_nw_info /usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py:1383
2018-05-04 12:55:11.034 7 DEBUG nova.objects.instance [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Lazy-loading 'info_cache' on Instance uuid 371e669b-0f15-49f2-9a84-bd1e89f34294 obj_load_attr /usr/lib/python2.7/site-packages/nova/objects/instance.py:1052
2018-05-04 12:55:11.406 7 DEBUG nova.network.base_api [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] Updating instance_info_cache with network_info: [{"profile": {}, "ovs_interfaceid": "ba8646b4-fa66-46b9-9f7e-a83163668bb8", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.0.0.8"}], "version": 4, "meta": {"dhcp_server": "10.0.0.2"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "8.8.8.8"}], "routes": [], "cidr": "10.0.0.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.0.0.1"}}], "meta": {"injected": false, "tenant_id": "7ea70c4f74c24199b14df0a570b6f93e", "mtu": 1450}, "id": "f1d14432-5a26-4b0a-89e7-6683bd7d2477", "label": "demo-net"}, "devname": "tapba8646b4-fa", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:d9:91:37", "active": true, "type": "ovs", "id": "ba8646b4-fa66-46b9-9f7e-a83163668bb8", "qbg_params": null}] update_instance_cache_with_nw_info /usr/lib/python2.7/site-packages/nova/network/base_api.py:48
2018-05-04 12:55:11.426 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Releasing semaphore "refresh_cache-371e669b-0f15-49f2-9a84-bd1e89f34294" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
2018-05-04 12:55:11.426 7 DEBUG oslo_concurrency.processutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Running cmd (subprocess): rm -rf /var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294_resize execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
2018-05-04 12:55:11.459 7 DEBUG oslo_concurrency.processutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] CMD "rm -rf /var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294_resize" returned: 0 in 0.033s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
2018-05-04 12:55:11.462 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Lock "/var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294/disk.info" acquired by "nova.virt.libvirt.imagebackend.write_to_disk_info_file" :: waited 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
2018-05-04 12:55:11.462 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Lock "/var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294/disk.info" released by "nova.virt.libvirt.imagebackend.write_to_disk_info_file" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
2018-05-04 12:55:11.482 7 DEBUG nova.virt.libvirt.vif [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=True,availability_zone='nova',cell_name=None,cleaned=False,config_drive='',created_at=2018-05-04T11:53:34Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=<?>,disable_terminate=False,display_description=None,display_name='cirros',ec2_ids=<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(2),host='compute01',hostname='cirros',id=2,image_ref='',info_cache=InstanceInfoCache,instance_type_id=2,kernel_id='',key_data='ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGUK82VwkyVJoNMlF5EhqfVaI+yOfhaMnMWbLg6ZDeKQjJ5gTZ7DvAfF2NOsyY9kYVo2ik3tQiVJmTyQbc4zQZN327PgnHm4HkmQUTx/pz57VfXzpGg1lQviGW8wr7+Pd7euMcazt2eZB3l4dL1xL96dSIoBzK0wG7B4KTEk8uWMhFkhVFrH6LQBtJSkrTkPWIafc3fv3XNhs4bo9mXQNOpWW6pJogx6FiPYqkFtynHdJTX0a/JcdJxmu/HPSwT3QmZ3yyasHQ1+It6Htte0P1ThdsMKavRD9Gki/r5cB2sUxUxbfSFMfiHdry7opefrbvRVU3G1xwKqrd9JdCCDe9 kolla@operator
This file should not be left on the source host.
For example, attempting to live-migrate back to this host results in a
failure:
2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7407, in pre_live_migration
2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] raise exception.DestinationDiskExists(path=instance_dir)
2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294]
2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] DestinationDiskExists: The supplied disk path (/var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294) already exists, it is expected not to exist.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1769131/+subscriptions
References