yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71849
[Bug 1757190] Re: resize fails with volume multiattach using with libvirt 4.0.0 (and qemu 2.11.1): Failed to get shared "write" lock
I can tell that during migrate_disk_and_power_off from the virt driver
on the source host, that the bdm.connection_info has the multiattach
flag set because we get this message when disconnecting the volume from
the first instance on the first host:
http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_12_057443
Mar 19 19:48:12.057443 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: INFO nova.virt.libvirt.driver [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] [instance: 0eed0237
-245e-4a18-9e30-9e72accd36c6] Detected multiple connections on this host
for volume: 652600d5-f6dc-4089-ba95-d71d7640cafa, skipping target
disconnect.
Which comes from the _should_disconnect_target code.
Then I eventually see _terminate_volume_connections which creates a new
empty attachment between the server and volume:
http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_12_307926
Mar 19 19:48:12.307926 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: DEBUG cinderclient.v3.client [None req-f044373a-53dd-4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929 tempest-AttachVolumeMultiAttachTest-1944074929] REQ: curl -g -i -X POST http://198.72.124.95/volume/v3/774fc170bdb24f20959a551dd666ef06/attachments -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "OpenStack-API-Version: volume 3.44" -H "X-Auth-Token: {SHA1}6ba4939df9b73a2d082ac56165bd1e04d7f33e6d" -H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-f044373a-53dd-4de7-b881-ff511ecb3a70" -d '{"attachment": {"instance_uuid": "0eed0237-245e-4a18-9e30-9e72accd36c6", "connector": null, "volume_uuid": "652600d5-f6dc-4089-ba95-d71d7640cafa"}}' {{(pid=27735) _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py:372}}
...
Mar 19 19:48:12.412178 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: RESP BODY: {"attachment": {"status": "reserved", "detached_at": "", "connection_info": {}, "attached_at": "", "attach_mode": null, "instance": "0eed0237-245e-4a18-9e30-9e72accd36c6", "volume_id": "652600d5-f6dc-4089-ba95-d71d7640cafa", "id": "c33fe43e-ac88-48df-93bd-3293746213b6"}}
And deletes the existing attachment:
Mar 19 19:48:12.418026 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: DEBUG cinderclient.v3.client [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] REQ: curl -g -i -X
DELETE
http://198.72.124.95/volume/v3/774fc170bdb24f20959a551dd666ef06/attachments/448fc228
-0e4c-49b4-b0a9-00f3b17468dd -H "OpenStack-API-Version: volume 3.44" -H
"User-Agent: python-cinderclient" -H "X-OpenStack-Request-ID: req-
f044373a-53dd-4de7-b881-ff511ecb3a70" -H "Accept: application/json" -H
"X-Auth-Token: {SHA1}6ba4939df9b73a2d082ac56165bd1e04d7f33e6d"
{{(pid=27735) _http_log_request /usr/local/lib/python2.7/dist-
packages/keystoneauth1/session.py:372}}
Later once we're in _finish_resize, we call _update_volume_attachments.
I can see that start happening here because we're getting the host
connector from the virt driver:
http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_13_521171
Mar 19 19:48:13.521171 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: DEBUG os_brick.utils [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] ==>
get_connector_properties: call u"{'execute': None, 'my_ip':
'198.72.124.95', 'enforce_multipath': True, 'host': 'ubuntu-xenial-inap-
mtl01-0003062768', 'root_helper': 'sudo nova-rootwrap
/etc/nova/rootwrap.conf', 'multipath': False}" {{(pid=27735)
trace_logging_wrapper /usr/local/lib/python2.7/dist-
packages/os_brick/utils.py:146}}
And then we update the new attachment with the host connector:
Mar 19 19:48:13.536842 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: DEBUG cinderclient.v3.client [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] REQ: curl -g -i -X PUT
http://198.72.124.95/volume/v3/774fc170bdb24f20959a551dd666ef06/attachments
/c33fe43e-ac88-48df-93bd-3293746213b6 -H "Accept: application/json" -H
"User-Agent: python-cinderclient" -H "OpenStack-API-Version: volume
3.44" -H "X-Auth-Token: {SHA1}6ba4939df9b73a2d082ac56165bd1e04d7f33e6d"
-H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-
f044373a-53dd-4de7-b881-ff511ecb3a70" -d '{"attachment": {"connector":
{"initiator": "iqn.1993-08.org.debian:01:d44f672dfb2a", "ip":
"198.72.124.95", "platform": "x86_64", "host": "ubuntu-xenial-inap-
mtl01-0003062768", "do_local_attach": false, "mountpoint": "/dev/vdb",
"os_type": "linux2", "multipath": false}}}' {{(pid=27735)
_http_log_request /usr/local/lib/python2.7/dist-
packages/keystoneauth1/session.py:372}}
I think I now see the problem.
_finish_resize in the compute manager calls this code:
block_device_info = self._get_instance_block_device_info(
context, instance, refresh_conn_info=True, bdms=bdms)
refresh_conn_info=True eventually gets us here:
https://github.com/openstack/nova/blob/f80b4e50093002f84b43ff245a605fbe44d34711/nova/virt/block_device.py#L639
I see the GET call to attachments here:
http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_15_652636
Mar 19 19:48:15.684627 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: RESP BODY: {"attachment": {"status": "attaching",
"detached_at": "", "connection_info": {"auth_password":
"5GDZJC3pBdPEJUgh", "attachment_id": "c33fe43e-ac88-48df-93bd-
3293746213b6", "target_discovered": false, "encrypted": false,
"driver_volume_type": "iscsi", "qos_specs": null, "target_iqn":
"iqn.2010-10.org.openstack:volume-652600d5-f6dc-4089-ba95-d71d7640cafa",
"target_portal": "198.72.124.95:3260", "volume_id":
"652600d5-f6dc-4089-ba95-d71d7640cafa", "target_lun": 1, "access_mode":
"rw", "auth_username": "67J6MYJni5Yu5VEEw8Q8", "auth_method": "CHAP"},
"attached_at": "2018-03-19T19:48:16.000000", "attach_mode": "rw",
"instance": "0eed0237-245e-4a18-9e30-9e72accd36c6", "volume_id":
"652600d5-f6dc-4089-ba95-d71d7640cafa", "id": "c33fe43e-ac88-48df-93bd-
3293746213b6"}}
And that connection_info gets updated in the BlockDeviceMapping object,
which we use later to set the shareable flag when connecting the volume
on the destination host. And that's where I see the guest xml doesn't
have the shareable flag because the BDM.connection_info doesn't have the
multiattach flag set:
http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_15_706213
block_device_info={'swap': None, 'root_device_name': u'/dev/vda',
'ephemerals': [], 'block_device_mapping': [{'guest_format': None,
'boot_index': None, 'mount_device': u'/dev/vdb', 'connection_info':
{'status': u'attaching', 'detached_at': u'', u'volume_id':
u'652600d5-f6dc-4089-ba95-d71d7640cafa', 'attach_mode': u'rw',
'driver_volume_type': u'iscsi', 'instance': u'0eed0237-245e-
4a18-9e30-9e72accd36c6', 'attached_at': u'2018-03-19T19:48:16.000000',
'serial': u'652600d5-f6dc-4089-ba95-d71d7640cafa', 'data':
{u'auth_password': u'***', u'target_discovered': False, u'encrypted':
False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack
:volume-652600d5-f6dc-4089-ba95-d71d7640cafa', u'target_portal':
u'198.72.124.95:3260', u'volume_id':
u'652600d5-f6dc-4089-ba95-d71d7640cafa', u'target_lun': 1,
u'access_mode': u'rw', u'auth_username': u'67J6MYJni5Yu5VEEw8Q8',
u'auth_method': u'CHAP'}}, 'disk_bus': 'virtio', 'device_type': 'disk',
'attachment_id': 'c33fe43e-ac88-48df-93bd-3293746213b6',
'delete_on_termination': False}]}
I have no idea why this didn't fail the resize tests before with older
libvirt and qemu, but maybe there was something different in those
versions that preserved the shareable flag or something, but this is
definitely obviously broken just by code inspection.
** Also affects: nova/queens
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1757190
Title:
resize fails with volume multiattach using with libvirt 4.0.0 (and
qemu 2.11.1): Failed to get shared "write" lock
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) queens series:
New
Bug description:
Seeing this in a patch https://review.openstack.org/#/c/554317/ that
runs the nova-multiattach job with the Queens Ubuntu Cloud Archive
which has libvirt 4.0.0 and qemu 2.11.1:
http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/libvirt/qemu/instance-00000066.txt.gz
2018-03-19T19:48:16.175548Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1: Failed to get shared "write" lock
Is another process using the image?
http://logs.openstack.org/17/554317/1/check/nova-multiattach/8e97832/logs/screen-n-cpu.txt.gz?level=TRACE#_Mar_19_19_48_16_261051
Mar 19 19:48:17.132940 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: ERROR nova.compute.manager [None req-3a092a4b-7ae7-4f29-9f78-97bf1dc0d46d service nova] [instance: 0eed0237-245e-4a18-9e30-9e72accd36c6] Setting instance vm_state to ERROR: libvirtError: internal error: process exited while connecting to monitor: 2018-03-19T19:48:16.147136Z qemu-system-x86_64: -drive file=/dev/sdb,format=raw,if=none,id=drive-virtio-disk1,serial=652600d5-f6dc-4089-ba95-d71d7640cafa,cache=none,aio=native: 'serial' is deprecated, please use the corresponding option of '-device' instead
Mar 19 19:48:17.133724 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: 2018-03-19T19:48:16.155115Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
Mar 19 19:48:17.134022 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: 2018-03-19T19:48:16.175548Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1: Failed to get shared "write" lock
That last error likely means the 'shareable' element isn't in the disk
config xml, and it's not:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/sdb'/>
<target dev='vdb' bus='virtio'/>
<serial>652600d5-f6dc-4089-ba95-d71d7640cafa</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1757190/+subscriptions
References