yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #37060
[Bug 1476591] Re: volume in_use after VM has terminated
Hi Amritanshu!
I'm answering the questions from the top down.
Q: Please specify the openstack setup procedure and release used for this deployment
A: I used a clean VM (ubuntu trusty) with devstack setup. The Cinder storage is using the Quobyte driver and the according storage system (which is running on a different machine). On this setup i did run the tempest volume tests.
If you require specific files from the setup please tell.
Q: Please specify the steps so that we can reproduce it and work on the same.
A: The reproduction might require a Quobyte installation currently i do not have a working alternativer setup but creation of that should be possible. The issue arises in general Cinder & Nova Code during Tempest test clean up and at the Quobyte driver simply calls Cinder/Nova-LibVirt functions.
Q: You have to manually detach it by the voulmes tab by going to that volume.
A: The process does not involve manual steps as this is an automated test from the Tempest test suite. If you find there's a detach operation missing maybe that's needed to go in there somewhere (we would have to find out / clarify where exactly).
** Changed in: nova
Status: Opinion => New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1476591
Title:
volume in_use after VM has terminated
Status in Cinder:
New
Status in OpenStack Compute (nova):
New
Bug description:
Running the Tempest test
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
with our CI system runs into a timing issue in the interaction between
Nova and Cinder.
Essentially on the Cinder side the volume status of the volume used
in the test is 'in_use' for several seconds _after_ the tests VM has
already been terminated and cleaned up, including the umounting of the
given volume on the Nova side. As Cinder sees the volume is still
'in_use' it calls the Nova API to perform the snapshot deletion but
since the volume has already been umounted on the Nova side the
resulting qemu-img operation triggered by Nova runs into a 'no such
file or directory' error.
The following log excerpts show this issue. Please refer to the given
time stamps (all components run in a devstack in the same machine so
times should be comparable):
1) Cinder log (please note i added two debug lines printing the volume status data in this snippet that are not part of the upstream driver):
2015-07-21 08:14:10.163 DEBUG cinder.volume.drivers.remotefs [req-6dae7608-e1f4-43af-ac8e-22d6aa7d72c9 tempest-TestVolumeBootPatternV2-966547300] Deleting snapshot ca09f072-c736-41ac-b6f3-7a137b0ee072: _delete_s
napshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:915
2015-07-21 08:14:10.164 DEBUG cinder.volume.drivers.remotefs [req-6dae7608-e1f4-43af-ac8e-22d6aa7d72c9 tempest-TestVolumeBootPatternV2-966547300] Initial snapshot deletion volume status is in-use _delete_snapsho
t /opt/stack/cinder/cinder/volume/drivers/remotefs.py:918
2015-07-21 08:14:10.420 DEBUG cinder.volume.drivers.remotefs [req-6dae7608-e1f4-43af-ac8e-22d6aa7d72c9 tempest-TestVolumeBootPatternV2-966547300] snapshot_file for this snap is: volume-7f4a5711-9f2c-4aa7-a6af-c6
ccf476702d.ca09f072-c736-41ac-b6f3-7a137b0ee072 _delete_snapshot /opt/stack/cinder/cinder/volume/drivers/remotefs.py:941
2015-07-21 08:14:10.420 DEBUG oslo_concurrency.processutils [req-6dae7608-e1f4-43af-ac8e-22d6aa7d72c9 tempest-TestVolumeBootPatternV2-966547300] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwra
p.conf env LC_ALL=C qemu-img info /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-7f4a5711-9f2c-4aa7-a6af-c6ccf476702d.ca09f072-c736-41ac-b6f3-7a137b0ee072 execute /usr/local/lib/python2.7/dis
t-packages/oslo_concurrency/processutils.py:230
2015-07-21 08:14:11.549 DEBUG oslo_concurrency.processutils [req-6dae7608-e1f4-43af-ac8e-22d6aa7d72c9 tempest-TestVolumeBootPatternV2-966547300] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf env LC_ALL=C q
emu-img info /opt/stack/data/cinder/mnt/38effaab97f9ee27ecb78121f511fa07/volume-7f4a5711-9f2c-4aa7-a6af-c6ccf476702d.ca09f072-c736-41ac-b6f3-7a137b0ee072" returned: 0 in 1.129s execute /usr/local/lib/python2.7/d
ist-packages/oslo_concurrency/processutils.py:260
2015-07-21 08:14:11.700 DEBUG cinder.volume.drivers.remotefs [req-6dae7608-e1f4-43af-ac8e-22d6aa7d72c9 tempest-TestVolumeBootPatternV2-966547300] In delete_snapshot, volume status is in-use _delete_snapshot /opt
/stack/cinder/cinder/volume/drivers/remotefs.py:956
2) Nova log:
2015-07-21 08:14:05.153 INFO nova.compute.manager [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] [instance: adbe1fa4-7136-4d29-91d
0-5772dbf4900d] Terminating instance
2015-07-21 08:14:05.167 6183 INFO nova.virt.libvirt.driver [-] [instance: adbe1fa4-7136-4d29-91d0-5772dbf4900d] Instance destroyed successfully.
2015-07-21 08:14:05.167 DEBUG nova.virt.libvirt.vif [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] vif_type=bridge instance=Instan
ce(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=False,availability_zone=None,cell_name=None,cleaned=False,config_drive='True',created_at=2015-07-21T08:13:32Z,default_ephemeral_device=No
ne,default_swap_device=None,deleted=False,deleted_at=None,disable_terminate=False,display_description='tempest-TestVolumeBootPatternV2-147549174',display_name='tempest-TestVolumeBootPatternV2-147549174',ec2_ids=
<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(6),host='manualvm',hostname='tempest-testvolumebootpatternv2-147549174',id=140,image_ref='',info_cache=InstanceInfoCache,instance_type_id=6,kernel_id='f203cd87-b5ed-491a-b7a5-46ad2e63bcbf',key_data='ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJ2yTWvjXK/IwK8xlmSjTn4drDwH5nMr65gYD4j3mkZBfBuk5Zuf10mLwVUFVDTgacJVcG4AE96poXofHD64pbGEcP9NqYGbLV7JSJzgNznUO/RkMdBBmmmfJiACvtWYpigAMIPPddgmpQZVaG+q8VelEWfDiErOBiaajz/NjL8/mmP78Lq0xvuP44vy/EElaSob8iiqFeg6rTlMpzhkFUV1a52CSQYjmD80mSf932ljXgNU9A4pE8NkO4lDOkflQfLO2F6wxV9H94VKxTBJgwnH8BANP7V8rjeIOwwUmpF1F6D4LceA66j1ONHk4ssyU3iimWt+UMwGCFWF6aa58n Generated-by-Nova',key_name='tempest-TestVolumeBootPatternV2-787092881',launch_index=0,launched_at=2015-07-21T08:13:42Z,launched_on='manualvm',locked=False,locked_by=None,memory_mb=64,metadata={},new_flavor=None,node='manualvm',numa_topology=<?>,old_flavor=None,os_type=None,pci_devices=<?>,pci_requests=<?>,power_state=4,progress=0,project_id='df3f26e1b7384a749ead0781a707491a',ramdisk_id='7c2ac51d-e571-4de6-b303-0302653677d8',reservation_id='r-40fs0tcw',root_device_name='/dev/vda',root_gb=0,scheduled_at=None,security_groups=SecurityGroupList,shutdown_terminate=False,system_metadata={image_base_image_ref='',image_checksum='eb9139e4942121f22bbc2afc0400b2a4',image_container_format='ami',image_disk_format='ami',image_image_id='5ce2d454-426c-4db5-8631-64228e29ff53',image_image_name='cirros-0.3.4-x86_64-uec',image_kernel_id='f203cd87-b5ed-491a-b7a5-46ad2e63bcbf',image_min_disk='0',image_min_ram='0',image_ramdisk_id='7c2ac51d-e571-4de6-b303-0302653677d8',image_size='25165824'},tags=<?>,task_state='deleting',terminated_at=None,updated_at=2015-07-21T08:14:04Z,user_data=None,user_id='9c6a53d821864b3ab3fc98933c672faf',uuid=adbe1fa4-7136-4d29-91d0-5772dbf4900d,vcpu_model=<?>,vcpus=1,vm_mode=None,vm_state='stopped') vif=VIF({'profile': None, 'ovs_interfaceid': None, 'preserve_on_delete': False, 'network': Network({'bridge': u'br100', 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': u'fixed', 'floating_ips': [], 'address': u'10.10.0.4'})], 'version': 4, 'meta': {u'dhcp_server': u'10.10.0.1'}, 'dns': [IP({'meta': {}, 'version': 4, 'type': u'dns', 'address': u'8.8.4.4'})], 'routes': [], 'cidr': u'10.10.0.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': u'gateway', 'address': u'10.10.0.1'})}), Subnet({'ips': [], 'version': None, 'meta': {u'dhcp_server': None}, 'dns': [], 'routes': [], 'cidr': None, 'gateway': IP({'meta': {}, 'version': None, 'type': u'gateway', 'address': None})})], 'meta': {u'tenant_id': None, u'should_create_bridge': True, u'bridge_interface': u'tap1'}, 'id': u'4be2d94e-e059-44e4-a781-919ccf9600a7', 'label': u'private'}), 'devname': None, 'vnic_type': u'normal', 'qbh_params': None, 'meta': {}, 'details': {}, 'address': u'fa:16:3e:de:a8:56', 'active': False, 'type': u'bridge', 'id': u'011753cb-a2d1-4cd0-bde6-3abf9b381871', 'qbg_params': None}) unplug /opt/stack/nova/nova/virt/libvirt/vif.py:800
2015-07-21 08:14:05.169 DEBUG oslo_concurrency.lockutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Lock "connect_volume" acquired by "disconnect_volume" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:251
2015-07-21 08:14:05.170 DEBUG oslo_concurrency.processutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Running cmd (subprocess): findmnt --target /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07 --source quobyte@176.9.127.22:7861/quobyteci_test_volume execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:230
2015-07-21 08:14:05.198 DEBUG oslo_concurrency.processutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] CMD "findmnt --target /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07 --source quobyte@176.9.127.22:7861/quobyteci_test_volume" returned: 0 in 0.028s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:260
2015-07-21 08:14:05.199 DEBUG nova.virt.libvirt.quobyte [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Unmounting /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07 umount_volume /opt/stack/nova/nova/virt/libvirt/quobyte.py:55
2015-07-21 08:14:05.200 DEBUG oslo_concurrency.processutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Running cmd (subprocess): umount.quobyte /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07 execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:230
2015-07-21 08:14:05.260 DEBUG oslo_concurrency.processutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] CMD "umount.quobyte /opt/stack/data/nova/mnt/38effaab97f9ee27ecb78121f511fa07" returned: 0 in 0.059s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:260
2015-07-21 08:14:05.260 DEBUG oslo_concurrency.lockutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Lock "connect_volume" released by "disconnect_volume" :: held 0.091s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
[..]
2015-07-21 08:14:07.243 DEBUG nova.compute.manager [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] [instance: adbe1fa4-7136-4d29-91d0-5772dbf4900d] terminating bdm BlockDeviceMapping(boot_index=0,connection_info='{"driver_volume_type": "quobyte", "mount_point_base": "/opt/stack/data/cinder/mnt", "serial": "313f97e6-e36a-4536-a7c6-00c340f12f8e", "data": {"name": "volume-313f97e6-e36a-4536-a7c6-00c340f12f8e", "format": "qcow2", "encrypted": false, "qos_specs": null, "export": "176.9.127.22:7861/quobyteci_test_volume", "access_mode": "rw", "options": null}}',created_at=2015-07-21T08:13:32Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vda',device_type='disk',disk_bus='virtio',guest_format=None,id=178,image_id=None,instance=<?>,instance_uuid=adbe1fa4-7136-4d29-91d0-5772dbf4900d,no_device=False,snapshot_id=None,source_type='volume',updated_at=2015-07-21T08:13:39Z,volume_id='313f97e6-e36a-4536-a7c6-00c340f12f8e',volume_size=1) _cleanup_volumes /opt/stack/nova/nova/compute/manager.py:2265
2015-07-21 08:14:07.466 DEBUG oslo_concurrency.lockutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Lock "compute_resources" acquired by "update_usage" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:251
2015-07-21 08:14:07.467 DEBUG nova.objects.instance [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Lazy-loading `numa_topology' on Instance uuid adbe1fa4-7136-4d29-91d0-5772dbf4900d obj_load_attr /opt/stack/nova/nova/objects/instance.py:1016
2015-07-21 08:14:07.528 DEBUG oslo_concurrency.lockutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Lock "compute_resources" released by "update_usage" :: held 0.061s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
2015-07-21 08:14:07.777 DEBUG oslo_concurrency.lockutils [req-e9bea12a-88ef-4f08-a320-d6e8e3b21228 tempest-TestVolumeBootPatternV2-1465284168 tempest-TestVolumeBootPatternV2-966547300] Lock "adbe1fa4-7136-4d29-91d0-5772dbf4900d" released by "do_terminate_instance" :: held 2.627s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:262
So what i see is that Nova terminates (including umount) at 08:14:05 until 08:14:07 and Cinder wants to remove the snapshot and check's the volumes state at 08:14:10 until 08:14:11 where it is still 'in-use'. Cinder subsequently chooses to contact Nova for snapshot deletion which fails as nova has already umounted the required volume.
Has anybody seen similar issues before?
And i'm grateful for hints/pointers to the code where the volume status is updated, i could not find out where that happens, so far.
PS: This setup uses the change under review in
https://review.openstack.org/#/c/192736/12 . Essentially i ran into
the issue by doing a manual tempest test run of that change.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1476591/+subscriptions
References