← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1769131] Re: After cold-migration of a volume-backed instance, disk.info file leftover on source host

 

Reviewed:  https://review.openstack.org/566367
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e3385707cb1ced55cd12b1314d8c0b68d354c38
Submitter: Zuul
Branch:    master

commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Fri May 4 12:58:07 2018 -0400

    libvirt: check image type before removing snapshots in _cleanup_resize
    
    Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk
    cleanup logic to _cleanup_resize because some image backends (Qcow2,
    Flat and Ploop) will re-create the instance directory and disk.info
    file when initializing the image backend object.
    
    However, that change did not take into account volume-backed instances
    being resized will not have a root disk *and* if the local disk is
    shared storage, removing the instance directory effectively deletes
    the instance files, like the console.log, on the destination host
    as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made
    to resolve that issue.
    
    However (see the pattern?), if you're doing a resize of a
    volume-backed instance that is not on shared storage, we won't remove
    the instance directory from the source host in _cleanup_resize. If the
    admin then later tries to live migrate the instance back to that host,
    it will fail with DestinationDiskExists in the pre_live_migration()
    method.
    
    This change is essentially a revert of
    I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for
    Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem
    is that creating certain imagebackend objects will recreate the
    instance directory and disk.info on the source host, we simply need
    to avoid creating the imagebackend object. The only reason we are
    getting an imagebackend object in _cleanup_resize is to remove
    image snapshot clones, which is only implemented by the Rbd image
    backend. Therefore, we can check to see if the image type supports
    clones and if not, don't go through the imagebackend init routine
    that, for some, will recreate the disk.
    
    Change-Id: Ib10081150e125961cba19cfa821bddfac4614408
    Closes-Bug: #1769131
    Related-Bug: #1666831
    Related-Bug: #1728603


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1769131

Title:
  After cold-migration of a volume-backed instance, disk.info file
  leftover on source host

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  In Progress
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  Tested using kolla-ansible, with kolla images stable/queens.

  In this setup there are only two compute nodes, with cinder/lvm for
  storage.

  A cirros instance is created, on compute02, and cold-migrated to
  compute01.

  At the step where it's awaiting confirmation, the following files can
  be found:

  compute01
  /var/lib/docker/volumes/nova_compute/_data/instances
  \-- 371e669b-0f15-49f2-9a84-bd1e89f34294
      \-- console.log

  compute02
  1 directory, 1 file
  /var/lib/docker/volumes/nova_compute/_data/instances
  \-- 371e669b-0f15-49f2-9a84-bd1e89f34294_resize
      \-- console.log

  1 directory, 1 file

  After confirming the migrate/resize, this becomes:

  compute01
  /var/lib/docker/volumes/nova_compute/_data/instances
  \-- 371e669b-0f15-49f2-9a84-bd1e89f34294
      \-- console.log

  compute02
  1 directory, 1 file
  /var/lib/docker/volumes/nova_compute/_data/instances
  \-- 371e669b-0f15-49f2-9a84-bd1e89f34294
      \-- disk.info

  1 directory, 1 file

  This log shows how after the _resize information is cleaned up, that
  *after this, this file ends up on the source host, where it is left.

  http://paste.openstack.org/show/720358/

  2018-05-04 12:55:10.818 7 DEBUG nova.compute.manager [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] Going to confirm migration 4 do_confirm_resize /usr/lib/python2.7/site-packages/nova/compute/manager.py:3684
  2018-05-04 12:55:11.032 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Acquired semaphore "refresh_cache-371e669b-0f15-49f2-9a84-bd1e89f34294" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
  2018-05-04 12:55:11.033 7 DEBUG nova.network.neutronv2.api [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] _get_instance_nw_info() _get_instance_nw_info /usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py:1383
  2018-05-04 12:55:11.034 7 DEBUG nova.objects.instance [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Lazy-loading 'info_cache' on Instance uuid 371e669b-0f15-49f2-9a84-bd1e89f34294 obj_load_attr /usr/lib/python2.7/site-packages/nova/objects/instance.py:1052
  2018-05-04 12:55:11.406 7 DEBUG nova.network.base_api [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] Updating instance_info_cache with network_info: [{"profile": {}, "ovs_interfaceid": "ba8646b4-fa66-46b9-9f7e-a83163668bb8", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.0.0.8"}], "version": 4, "meta": {"dhcp_server": "10.0.0.2"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "8.8.8.8"}], "routes": [], "cidr": "10.0.0.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.0.0.1"}}], "meta": {"injected": false, "tenant_id": "7ea70c4f74c24199b14df0a570b6f93e", "mtu": 1450}, "id": "f1d14432-5a26-4b0a-89e7-6683bd7d2477", "label": "demo-net"}, "devname": "tapba8646b4-fa", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:d9:91:37", "active": true, "type": "ovs", "id": "ba8646b4-fa66-46b9-9f7e-a83163668bb8", "qbg_params": null}] update_instance_cache_with_nw_info /usr/lib/python2.7/site-packages/nova/network/base_api.py:48
  2018-05-04 12:55:11.426 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Releasing semaphore "refresh_cache-371e669b-0f15-49f2-9a84-bd1e89f34294" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
  2018-05-04 12:55:11.426 7 DEBUG oslo_concurrency.processutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Running cmd (subprocess): rm -rf /var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294_resize execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:372
  2018-05-04 12:55:11.459 7 DEBUG oslo_concurrency.processutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] CMD "rm -rf /var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294_resize" returned: 0 in 0.033s execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:409
  2018-05-04 12:55:11.462 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Lock "/var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294/disk.info" acquired by "nova.virt.libvirt.imagebackend.write_to_disk_info_file" :: waited 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273
  2018-05-04 12:55:11.462 7 DEBUG oslo_concurrency.lockutils [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] Lock "/var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294/disk.info" released by "nova.virt.libvirt.imagebackend.write_to_disk_info_file" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285
  2018-05-04 12:55:11.482 7 DEBUG nova.virt.libvirt.vif [req-510561e2-eabb-4c37-8fc3-d56e9f50bf6e 64ca3042227c48ea84d77461b14b8acb 7ea70c4f74c24199b14df0a570b6f93e - default default] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=True,availability_zone='nova',cell_name=None,cleaned=False,config_drive='',created_at=2018-05-04T11:53:34Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=<?>,disable_terminate=False,display_description=None,display_name='cirros',ec2_ids=<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(2),host='compute01',hostname='cirros',id=2,image_ref='',info_cache=InstanceInfoCache,instance_type_id=2,kernel_id='',key_data='ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDGUK82VwkyVJoNMlF5EhqfVaI+yOfhaMnMWbLg6ZDeKQjJ5gTZ7DvAfF2NOsyY9kYVo2ik3tQiVJmTyQbc4zQZN327PgnHm4HkmQUTx/pz57VfXzpGg1lQviGW8wr7+Pd7euMcazt2eZB3l4dL1xL96dSIoBzK0wG7B4KTEk8uWMhFkhVFrH6LQBtJSkrTkPWIafc3fv3XNhs4bo9mXQNOpWW6pJogx6FiPYqkFtynHdJTX0a/JcdJxmu/HPSwT3QmZ3yyasHQ1+It6Htte0P1ThdsMKavRD9Gki/r5cB2sUxUxbfSFMfiHdry7opefrbvRVU3G1xwKqrd9JdCCDe9 kolla@operator

  This file should not be left on the source host.

  For example, attempting to live-migrate back to this host results in a
  failure:

  2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7407, in pre_live_migration
  2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294]     raise exception.DestinationDiskExists(path=instance_dir)
  2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294]
  2018-05-04 13:45:40.546 7 ERROR nova.compute.manager [instance: 371e669b-0f15-49f2-9a84-bd1e89f34294] DestinationDiskExists: The supplied disk path (/var/lib/nova/instances/371e669b-0f15-49f2-9a84-bd1e89f34294) already exists, it is expected not to exist.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1769131/+subscriptions


References