← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1454512] [NEW] Device for other volume is deleted unexpected during volume detach when iscsi multipath is used

 

Public bug reported:

We found this issue during testing volume detachment when iSCSI
multipath is used. When a same iSCSI protal and iqn is shared by
multiple LUNs, device from other volume maybe be deleted unexpected.
This is found both in Kilo and the latest code.

For example, the devices under /dev/disk/by-path may looks like below when LUN 23 and 231 are from a same storage system and a same iSCSI protal and iqn are used. ls /dev/disk/by-path
ip-192.168.3.50:3260-iscsi-<iqna>-lun-23
ip-192.168.3.50:3260-iscsi-<iqna>-lun-231
ip-192.168.3.51:3260-iscsi-<iqnb>-lun-23
ip-192.168.3.51:3260-iscsi-<iqnb>-lun-231

When we try to detach volume corresponding LUN 23 from the host, we
noticed that the devices regarding to LUN 231 are also deleted which may
cause the data unavailable.

Why this happen? After digging into the nova code, below is the clue:

nova/virt/libvirt/volume.py
770 def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns):
771 entries = self._get_iscsi_devices()
772 # Loop through ips_iqns to construct all paths
773 iqn_luns = []
774 for ip, iqn in ips_iqns:
775 iqn_lun = '%s-lun-%s' % (iqn,
776 iscsi_properties.get('target_lun', 0))
777 iqn_luns.append(iqn_lun)
778 for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
779 for iqn_lun in iqn_luns:
780 if iqn_lun in dev: ==> This is incorrect, device for LUN 231 will made this be True.
781 self._delete_device(dev)
782
783 self._rescan_multipath()

Due to the incorrect logic in line 780, detach LUN xx will deleted devices for other LUNs starts with xx, such as xxy, xxz. We could use dev.endswith(iqn_lun) to avoid it.
===================================
stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
commit f4504f3575b35ec14390b4b678e441fcf953f47b
Merge: 3f21f60 5fbd852
Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
Date: Tue May 12 22:46:43 2015 +0000

    Merge "Remove db layer hard-code permission checks for
network_get_all_by_host"

** Affects: nova
     Importance: Undecided
     Assignee: Tina Tang (tina-tang)
         Status: New

** Description changed:

  We found this issue during testing volume detachment when iSCSI
  multipath is used. When a same iSCSI protal and iqn is shared by
  multiple LUNs, device from other volume maybe be deleted unexpected.
  This is found both in Kilo and the latest code.
  
+ For example, the devices under /dev/disk/by-path may looks like below when LUN 23 and 231 are from a same storage system and a same iSCSI protal and iqn are used. ls /dev/disk/by-path
+ ip-192.168.3.50:3260-iscsi-<iqna>-lun-23
+ ip-192.168.3.50:3260-iscsi-<iqna>-lun-231
+ ip-192.168.3.51:3260-iscsi-<iqnb>-lun-23
+ ip-192.168.3.51:3260-iscsi-<iqnb>-lun-231
  
- For example, the devices under /dev/disk/by-path may looks like below when LUN 23 and 231 are from a same storage system and a same iSCSI protal and iqn are used. ls /dev/disk/by-path
- ip-192.168.3.50:3260-iscsi-<iqna>-lun-23 -> ../../sdh
- ip-192.168.3.50:3260-iscsi-<iqna>-lun-231 -> ../../sdk
- ip-192.168.3.51:3260-iscsi-<iqnb>-lun-23 -> ../../sdd
- ip-192.168.3.51:3260-iscsi-<iqnb>-lun-231 -> ../../sdi
+ When we try to detach volume corresponding LUN 23 from the host, we
+ noticed that the devices regarding to LUN 231 are also deleted which may
+ cause the data unavailable.
  
- 
- When we try to detach volume corresponding LUN 23 from the host, the devices regarding to LUN 231 are also deleted which may cause the data unavailable. 
- 
- Why this happen?  After digging into the node code, below is the clue:
+ Why this happen? After digging into the nova code, below is the clue:
  
  nova/virt/libvirt/volume.py
- 770     def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns):
-  771         entries = self._get_iscsi_devices()
-  772         # Loop through ips_iqns to construct all paths
-  773         iqn_luns = []
-  774         for ip, iqn in ips_iqns:
-  775             iqn_lun = '%s-lun-%s' % (iqn,
-  776                                      iscsi_properties.get('target_lun', 0))
-  777             iqn_luns.append(iqn_lun)
-  778         for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
-  779             for iqn_lun in iqn_luns:
-  780                 if iqn_lun in dev:                 ==> This is incorrect, device for LUN 231 will made this be True. 
-  781                     self._delete_device(dev)
-  782
-  783         self._rescan_multipath()
+ 770 def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns):
+ 771     entries = self._get_iscsi_devices()
+ 772     # Loop through ips_iqns to construct all paths
+ 773     iqn_luns = []
+ 774     for ip, iqn in ips_iqns:
+ 775         iqn_lun = '%s-lun-%s' % (iqn,
+ 776            iscsi_properties.get('target_lun', 0))
+ 777         iqn_luns.append(iqn_lun)
+ 778     for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
+ 779     for iqn_lun in iqn_luns:
+ 780        if iqn_lun in dev: ==> This is incorrect, device for LUN 231 will made this be True.
+ 781            self._delete_device(dev)
+ 782
+ 783     self._rescan_multipath()
  
  Due to the incorrect logic in line 780, detach LUN xx will deleted devices for other LUNs starts with xx, such as xxy, xxz
  ===================================
  stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
  commit f4504f3575b35ec14390b4b678e441fcf953f47b
  Merge: 3f21f60 5fbd852
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
- Date:   Tue May 12 22:46:43 2015 +0000
+ Date: Tue May 12 22:46:43 2015 +0000
  
      Merge "Remove db layer hard-code permission checks for
  network_get_all_by_host"

** Description changed:

  We found this issue during testing volume detachment when iSCSI
  multipath is used. When a same iSCSI protal and iqn is shared by
  multiple LUNs, device from other volume maybe be deleted unexpected.
  This is found both in Kilo and the latest code.
  
  For example, the devices under /dev/disk/by-path may looks like below when LUN 23 and 231 are from a same storage system and a same iSCSI protal and iqn are used. ls /dev/disk/by-path
  ip-192.168.3.50:3260-iscsi-<iqna>-lun-23
  ip-192.168.3.50:3260-iscsi-<iqna>-lun-231
  ip-192.168.3.51:3260-iscsi-<iqnb>-lun-23
  ip-192.168.3.51:3260-iscsi-<iqnb>-lun-231
  
  When we try to detach volume corresponding LUN 23 from the host, we
  noticed that the devices regarding to LUN 231 are also deleted which may
  cause the data unavailable.
  
  Why this happen? After digging into the nova code, below is the clue:
  
  nova/virt/libvirt/volume.py
  770 def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns):
- 771     entries = self._get_iscsi_devices()
- 772     # Loop through ips_iqns to construct all paths
- 773     iqn_luns = []
- 774     for ip, iqn in ips_iqns:
- 775         iqn_lun = '%s-lun-%s' % (iqn,
- 776            iscsi_properties.get('target_lun', 0))
- 777         iqn_luns.append(iqn_lun)
- 778     for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
- 779     for iqn_lun in iqn_luns:
- 780        if iqn_lun in dev: ==> This is incorrect, device for LUN 231 will made this be True.
- 781            self._delete_device(dev)
+ 771 entries = self._get_iscsi_devices()
+ 772 # Loop through ips_iqns to construct all paths
+ 773 iqn_luns = []
+ 774 for ip, iqn in ips_iqns:
+ 775 iqn_lun = '%s-lun-%s' % (iqn,
+ 776 iscsi_properties.get('target_lun', 0))
+ 777 iqn_luns.append(iqn_lun)
+ 778 for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
+ 779 for iqn_lun in iqn_luns:
+ 780 if iqn_lun in dev: ==> This is incorrect, device for LUN 231 will made this be True.
+ 781 self._delete_device(dev)
  782
- 783     self._rescan_multipath()
+ 783 self._rescan_multipath()
  
- Due to the incorrect logic in line 780, detach LUN xx will deleted devices for other LUNs starts with xx, such as xxy, xxz
+ Due to the incorrect logic in line 780, detach LUN xx will deleted devices for other LUNs starts with xx, such as xxy, xxz. We could use dev.endswith(iqn_lun) to avoid it.
  ===================================
  stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
  commit f4504f3575b35ec14390b4b678e441fcf953f47b
  Merge: 3f21f60 5fbd852
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
  Date: Tue May 12 22:46:43 2015 +0000
  
      Merge "Remove db layer hard-code permission checks for
  network_get_all_by_host"

** Changed in: nova
     Assignee: (unassigned) => Tina Tang (tina-tang)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1454512

Title:
  Device for other volume is deleted unexpected during volume detach
  when iscsi multipath is used

Status in OpenStack Compute (Nova):
  New

Bug description:
  We found this issue during testing volume detachment when iSCSI
  multipath is used. When a same iSCSI protal and iqn is shared by
  multiple LUNs, device from other volume maybe be deleted unexpected.
  This is found both in Kilo and the latest code.

  For example, the devices under /dev/disk/by-path may looks like below when LUN 23 and 231 are from a same storage system and a same iSCSI protal and iqn are used. ls /dev/disk/by-path
  ip-192.168.3.50:3260-iscsi-<iqna>-lun-23
  ip-192.168.3.50:3260-iscsi-<iqna>-lun-231
  ip-192.168.3.51:3260-iscsi-<iqnb>-lun-23
  ip-192.168.3.51:3260-iscsi-<iqnb>-lun-231

  When we try to detach volume corresponding LUN 23 from the host, we
  noticed that the devices regarding to LUN 231 are also deleted which
  may cause the data unavailable.

  Why this happen? After digging into the nova code, below is the clue:

  nova/virt/libvirt/volume.py
  770 def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns):
  771 entries = self._get_iscsi_devices()
  772 # Loop through ips_iqns to construct all paths
  773 iqn_luns = []
  774 for ip, iqn in ips_iqns:
  775 iqn_lun = '%s-lun-%s' % (iqn,
  776 iscsi_properties.get('target_lun', 0))
  777 iqn_luns.append(iqn_lun)
  778 for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
  779 for iqn_lun in iqn_luns:
  780 if iqn_lun in dev: ==> This is incorrect, device for LUN 231 will made this be True.
  781 self._delete_device(dev)
  782
  783 self._rescan_multipath()

  Due to the incorrect logic in line 780, detach LUN xx will deleted devices for other LUNs starts with xx, such as xxy, xxz. We could use dev.endswith(iqn_lun) to avoid it.
  ===================================
  stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
  commit f4504f3575b35ec14390b4b678e441fcf953f47b
  Merge: 3f21f60 5fbd852
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
  Date: Tue May 12 22:46:43 2015 +0000

      Merge "Remove db layer hard-code permission checks for
  network_get_all_by_host"

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1454512/+subscriptions


Follow ups

References