yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #66146
[Bug 1683972] Re: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.
Resolved by Change-Id: I146a74f9f79c68a89677b9b26a324e06a35886f2
** No longer affects: nova
** Changed in: os-brick
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683972
Title:
Overlapping iSCSI volume detach/attach can leave behind broken SCSI
devices and multipath maps.
Status in os-brick:
Fix Released
Bug description:
This is fairly easy to reproduce by simultaneously launching and
terminating several boot-from-volume instances on the same compute
node, with a cinder back-end that takes some time to complete
connection-termination (e.g. ZFSSA). The initial symptom is failed
multipath maps, and kernel errors writing to SCSI devices. Later
symptoms include failure to launch volume-backed instances due to
multipath command errors.
The issue is caused by a race-condition between the unplumbing of a
volume being detached/disconnected, and the plumbing of another volume
being attached to a different instance.
For example, when an instance is terminated,
compute.manager._shutdown_instance() calls driver.destroy(), then it
calls volume_api.terminate_connection() for the volume(s).
driver.destroy() is responsible for cleaning up devices on the compute
node - in my case, the libvirt driver calls disconnect_volume() in
os_brick.initiator.connectors.iscsi, which removes the multipath map
and SCSI device(s) assocaited with each volume.
volume_api.terminate_connection() then instructs cinder to stop
presenting the volume to the connector (which translates to
disassociating a LUN from the compute node's initiator on the back-end
storage-device (iSCSI target)).
The problem occurs when another thread is attaching a volume to
another instance on the same compute node at the same time. That calls
connect_volume() in os_brick.initiator.connectors.iscsi, which does an
iSCSI rescan. If the cinder back-end has not yet removed the
LUN/initiator association, the rescan picks it back up, and recreates
the SCSI device and the multipath map on the compute node. Shortly
thereafter, that SCSI device becomes unresponsive, but it (and the
multipath map) never go away.
To make matters worse, the cinder back-end may use the same LUN number
for another volume in the future, but that LUN number (plus portal
address) is still associated with the broken SCSI device and multipath
map on the compute node, so the wrong multipath map may be picked up
by a future volume attachment attempt.
There is locking around the connect_volume() and disconnect_volume()
functions in os_brick, but this is insufficient, because it doesn't
extend over the cinder connection termination.
I've been able to hack around this with a rudimentary lock on the
parts of compute.manager that deal with volume detachment and
connection termination, and the connect_volume() function in
virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the
last two years. I was surprised to find that the problem is still
present in Ocata. The same workaround seems to be effective. I'm
fairly sure that the way I've implemented it is not completely
correct, though, so it should be implemented properly by someone more
intimately familiar with all of the code-paths.
To manage notifications about this bug go to:
https://bugs.launchpad.net/os-brick/+bug/1683972/+subscriptions
References