← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1757190] Re: resize fails with volume multiattach using with libvirt 4.0.0 (and qemu 2.11.1): Failed to get shared "write" lock

 

I can tell that during migrate_disk_and_power_off from the virt driver
on the source host, that the bdm.connection_info has the multiattach
flag set because we get this message when disconnecting the volume from
the first instance on the first host:

http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_12_057443

Mar 19 19:48:12.057443 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: INFO nova.virt.libvirt.driver [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] [instance: 0eed0237
-245e-4a18-9e30-9e72accd36c6] Detected multiple connections on this host
for volume: 652600d5-f6dc-4089-ba95-d71d7640cafa, skipping target
disconnect.

Which comes from the _should_disconnect_target code.

Then I eventually see _terminate_volume_connections which creates a new
empty attachment between the server and volume:

http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_12_307926

Mar 19 19:48:12.307926 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: DEBUG cinderclient.v3.client [None req-f044373a-53dd-4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929 tempest-AttachVolumeMultiAttachTest-1944074929] REQ: curl -g -i -X POST http://198.72.124.95/volume/v3/774fc170bdb24f20959a551dd666ef06/attachments -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "OpenStack-API-Version: volume 3.44" -H "X-Auth-Token: {SHA1}6ba4939df9b73a2d082ac56165bd1e04d7f33e6d" -H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-f044373a-53dd-4de7-b881-ff511ecb3a70" -d '{"attachment": {"instance_uuid": "0eed0237-245e-4a18-9e30-9e72accd36c6", "connector": null, "volume_uuid": "652600d5-f6dc-4089-ba95-d71d7640cafa"}}' {{(pid=27735) _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py:372}}
...
Mar 19 19:48:12.412178 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: RESP BODY: {"attachment": {"status": "reserved", "detached_at": "", "connection_info": {}, "attached_at": "", "attach_mode": null, "instance": "0eed0237-245e-4a18-9e30-9e72accd36c6", "volume_id": "652600d5-f6dc-4089-ba95-d71d7640cafa", "id": "c33fe43e-ac88-48df-93bd-3293746213b6"}}

And deletes the existing attachment:

Mar 19 19:48:12.418026 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: DEBUG cinderclient.v3.client [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] REQ: curl -g -i -X
DELETE
http://198.72.124.95/volume/v3/774fc170bdb24f20959a551dd666ef06/attachments/448fc228
-0e4c-49b4-b0a9-00f3b17468dd -H "OpenStack-API-Version: volume 3.44" -H
"User-Agent: python-cinderclient" -H "X-OpenStack-Request-ID: req-
f044373a-53dd-4de7-b881-ff511ecb3a70" -H "Accept: application/json" -H
"X-Auth-Token: {SHA1}6ba4939df9b73a2d082ac56165bd1e04d7f33e6d"
{{(pid=27735) _http_log_request /usr/local/lib/python2.7/dist-
packages/keystoneauth1/session.py:372}}

Later once we're in _finish_resize, we call _update_volume_attachments.
I can see that start happening here because we're getting the host
connector from the virt driver:

http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_13_521171

Mar 19 19:48:13.521171 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: DEBUG os_brick.utils [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] ==>
get_connector_properties: call u"{'execute': None, 'my_ip':
'198.72.124.95', 'enforce_multipath': True, 'host': 'ubuntu-xenial-inap-
mtl01-0003062768', 'root_helper': 'sudo nova-rootwrap
/etc/nova/rootwrap.conf', 'multipath': False}" {{(pid=27735)
trace_logging_wrapper /usr/local/lib/python2.7/dist-
packages/os_brick/utils.py:146}}

And then we update the new attachment with the host connector:

Mar 19 19:48:13.536842 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: DEBUG cinderclient.v3.client [None req-f044373a-53dd-
4de7-b881-ff511ecb3a70 tempest-AttachVolumeMultiAttachTest-1944074929
tempest-AttachVolumeMultiAttachTest-1944074929] REQ: curl -g -i -X PUT
http://198.72.124.95/volume/v3/774fc170bdb24f20959a551dd666ef06/attachments
/c33fe43e-ac88-48df-93bd-3293746213b6 -H "Accept: application/json" -H
"User-Agent: python-cinderclient" -H "OpenStack-API-Version: volume
3.44" -H "X-Auth-Token: {SHA1}6ba4939df9b73a2d082ac56165bd1e04d7f33e6d"
-H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-
f044373a-53dd-4de7-b881-ff511ecb3a70" -d '{"attachment": {"connector":
{"initiator": "iqn.1993-08.org.debian:01:d44f672dfb2a", "ip":
"198.72.124.95", "platform": "x86_64", "host": "ubuntu-xenial-inap-
mtl01-0003062768", "do_local_attach": false, "mountpoint": "/dev/vdb",
"os_type": "linux2", "multipath": false}}}' {{(pid=27735)
_http_log_request /usr/local/lib/python2.7/dist-
packages/keystoneauth1/session.py:372}}

I think I now see the problem.

_finish_resize in the compute manager calls this code:

        block_device_info = self._get_instance_block_device_info(
            context, instance, refresh_conn_info=True, bdms=bdms)

refresh_conn_info=True eventually gets us here:

https://github.com/openstack/nova/blob/f80b4e50093002f84b43ff245a605fbe44d34711/nova/virt/block_device.py#L639

I see the GET call to attachments here:

http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_15_652636

Mar 19 19:48:15.684627 ubuntu-xenial-inap-mtl01-0003062768 nova-
compute[27735]: RESP BODY: {"attachment": {"status": "attaching",
"detached_at": "", "connection_info": {"auth_password":
"5GDZJC3pBdPEJUgh", "attachment_id": "c33fe43e-ac88-48df-93bd-
3293746213b6", "target_discovered": false, "encrypted": false,
"driver_volume_type": "iscsi", "qos_specs": null, "target_iqn":
"iqn.2010-10.org.openstack:volume-652600d5-f6dc-4089-ba95-d71d7640cafa",
"target_portal": "198.72.124.95:3260", "volume_id":
"652600d5-f6dc-4089-ba95-d71d7640cafa", "target_lun": 1, "access_mode":
"rw", "auth_username": "67J6MYJni5Yu5VEEw8Q8", "auth_method": "CHAP"},
"attached_at": "2018-03-19T19:48:16.000000", "attach_mode": "rw",
"instance": "0eed0237-245e-4a18-9e30-9e72accd36c6", "volume_id":
"652600d5-f6dc-4089-ba95-d71d7640cafa", "id": "c33fe43e-ac88-48df-93bd-
3293746213b6"}}

And that connection_info gets updated in the BlockDeviceMapping object,
which we use later to set the shareable flag when connecting the volume
on the destination host. And that's where I see the guest xml doesn't
have the shareable flag because the BDM.connection_info doesn't have the
multiattach flag set:

http://logs.openstack.org/17/554317/1/check/nova-
multiattach/8e97832/logs/screen-n-cpu.txt.gz#_Mar_19_19_48_15_706213

block_device_info={'swap': None, 'root_device_name': u'/dev/vda',
'ephemerals': [], 'block_device_mapping': [{'guest_format': None,
'boot_index': None, 'mount_device': u'/dev/vdb', 'connection_info':
{'status': u'attaching', 'detached_at': u'', u'volume_id':
u'652600d5-f6dc-4089-ba95-d71d7640cafa', 'attach_mode': u'rw',
'driver_volume_type': u'iscsi', 'instance': u'0eed0237-245e-
4a18-9e30-9e72accd36c6', 'attached_at': u'2018-03-19T19:48:16.000000',
'serial': u'652600d5-f6dc-4089-ba95-d71d7640cafa', 'data':
{u'auth_password': u'***', u'target_discovered': False, u'encrypted':
False, u'qos_specs': None, u'target_iqn': u'iqn.2010-10.org.openstack
:volume-652600d5-f6dc-4089-ba95-d71d7640cafa', u'target_portal':
u'198.72.124.95:3260', u'volume_id':
u'652600d5-f6dc-4089-ba95-d71d7640cafa', u'target_lun': 1,
u'access_mode': u'rw', u'auth_username': u'67J6MYJni5Yu5VEEw8Q8',
u'auth_method': u'CHAP'}}, 'disk_bus': 'virtio', 'device_type': 'disk',
'attachment_id': 'c33fe43e-ac88-48df-93bd-3293746213b6',
'delete_on_termination': False}]}

I have no idea why this didn't fail the resize tests before with older
libvirt and qemu, but maybe there was something different in those
versions that preserved the shareable flag or something, but this is
definitely obviously broken just by code inspection.

** Also affects: nova/queens
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1757190

Title:
  resize fails with volume multiattach using with libvirt 4.0.0 (and
  qemu 2.11.1): Failed to get shared "write" lock

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) queens series:
  New

Bug description:
  Seeing this in a patch https://review.openstack.org/#/c/554317/ that
  runs the nova-multiattach job with the Queens Ubuntu Cloud Archive
  which has libvirt 4.0.0 and qemu 2.11.1:

  http://logs.openstack.org/17/554317/1/check/nova-
  multiattach/8e97832/logs/libvirt/qemu/instance-00000066.txt.gz

  2018-03-19T19:48:16.175548Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1: Failed to get shared "write" lock
  Is another process using the image?

  
  http://logs.openstack.org/17/554317/1/check/nova-multiattach/8e97832/logs/screen-n-cpu.txt.gz?level=TRACE#_Mar_19_19_48_16_261051

  Mar 19 19:48:17.132940 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: ERROR nova.compute.manager [None req-3a092a4b-7ae7-4f29-9f78-97bf1dc0d46d service nova] [instance: 0eed0237-245e-4a18-9e30-9e72accd36c6] Setting instance vm_state to ERROR: libvirtError: internal error: process exited while connecting to monitor: 2018-03-19T19:48:16.147136Z qemu-system-x86_64: -drive file=/dev/sdb,format=raw,if=none,id=drive-virtio-disk1,serial=652600d5-f6dc-4089-ba95-d71d7640cafa,cache=none,aio=native: 'serial' is deprecated, please use the corresponding option of '-device' instead
  Mar 19 19:48:17.133724 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: 2018-03-19T19:48:16.155115Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  Mar 19 19:48:17.134022 ubuntu-xenial-inap-mtl01-0003062768 nova-compute[27735]: 2018-03-19T19:48:16.175548Z qemu-system-x86_64: -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1: Failed to get shared "write" lock

  That last error likely means the 'shareable' element isn't in the disk
  config xml, and it's not:

  <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/sdb'/>
      <target dev='vdb' bus='virtio'/>
      <serial>652600d5-f6dc-4089-ba95-d71d7640cafa</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
  </disk>

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1757190/+subscriptions


References