yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1683972] Re: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: iain MacDonnell <1683972@xxxxxxxxxxxxxxxxxx>
Date: Sat, 22 Apr 2017 00:42:40 -0000
Reply-to: Bug 1683972 <1683972@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Also affects: os-brick
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683972

Title:
  Overlapping iSCSI volume detach/attach can leave behind broken SCSI
  devices and multipath maps.

Status in OpenStack Compute (nova):
  New
Status in os-brick:
  New

Bug description:
  This is fairly easy to reproduce by simultaneously launching and
  terminating several boot-from-volume instances on the same compute
  node, with a cinder back-end that takes some time to complete
  connection-termination (e.g. ZFSSA). The initial symptom is failed
  multipath maps, and kernel errors writing to SCSI devices. Later
  symptoms include failure to launch volume-backed instances due to
  multipath command errors.

  The issue is caused by a race-condition between the unplumbing of a
  volume being detached/disconnected, and the plumbing of another volume
  being attached to a different instance.

  For example, when an instance is terminated,
  compute.manager._shutdown_instance() calls driver.destroy(), then it
  calls volume_api.terminate_connection() for the volume(s).

  driver.destroy() is responsible for cleaning up devices on the compute
  node - in my case, the libvirt driver calls disconnect_volume() in
  os_brick.initiator.connectors.iscsi, which removes the multipath map
  and SCSI device(s) assocaited with each volume.

  volume_api.terminate_connection() then instructs cinder to stop
  presenting the volume to the connector (which translates to
  disassociating a LUN from the compute node's initiator on the back-end
  storage-device (iSCSI target)).

  The problem occurs when another thread is attaching a volume to
  another instance on the same compute node at the same time. That calls
  connect_volume() in os_brick.initiator.connectors.iscsi, which does an
  iSCSI rescan. If the cinder back-end has not yet removed the
  LUN/initiator association, the rescan picks it back up, and recreates
  the SCSI device and the multipath map on the compute node. Shortly
  thereafter, that SCSI device becomes unresponsive, but it (and the
  multipath map) never go away.

  To make matters worse, the cinder back-end may use the same LUN number
  for another volume in the future, but that LUN number (plus portal
  address) is still associated with the broken SCSI device and multipath
  map on the compute node, so the wrong multipath map may be picked up
  by a future volume attachment attempt.

  There is locking around the connect_volume() and disconnect_volume()
  functions in os_brick, but this is insufficient, because it doesn't
  extend over the cinder connection termination.

  I've been able to hack around this with a rudimentary lock on the
  parts of compute.manager that deal with volume detachment and
  connection termination, and the connect_volume() function in
  virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the
  last two years. I was surprised to find that the problem is still
  present in Ocata. The same workaround seems to be effective. I'm
  fairly sure that the way I've implemented it is not completely
  correct, though, so it should be implemented properly by someone more
  intimately familiar with all of the code-paths.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1683972/+subscriptions

References

[Bug 1683972] [NEW] Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.
From: iain MacDonnell, 2017-04-19