← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1454978] [NEW] [iSCSI Multipath]Thousands of multipath -ll <mp-id > are executed during volume detachment when multiple LUNs are exposed on a same target

 

Public bug reported:

iSCSI multipath has performance issue on volume detachment when multiple
LUNs are exposed via single target(iqn).

1. We are using VNX as cinder backends. VNX is exposing multiple LUNs
via a iqn. And a LUN is exposed via different iqns for multipathing.
Libvirt driver is used in nova. And the virt_type is kvm.

2. After we attached 100 volumes to VMs, and then do volume detachment
in batch, we noticed that thousands of "multipath -ll <mp_id>" are
executed per a volume detachement. In out enviornment, a "multipath -ll
<mp_id>" takes about 0.2s, the performance is bad.

3. Why there are so many "multipath -ll <mp-id>" triggerred?
In order to find all pathes of a multipath device, the code went through all the devices under /dev/disk/by-path which used the same iqn and execute ‘multipath –ll’ on each of them to get the multipath id. When the multipath id of a device is the same as the volume to be detached. Then it is a path of the volume. When each iqn only expose one LUN, this code do not expose performance issue. However, when multiple luns are expose via a single iqn, the problems comes out.

Assuming taht we have n LUNs attached. Each LUN has m iqns for multipathing, then there will be m*n devices under /dev/disk/by-path. And they are sharing m iqns. Then,
    -- Code line 623- 644 will trigger o(n*m) times of "multipath -ll <mp-id>"
    -- Code line 648-649 will trigger o(!m) times of "multipath -ll <mp-id>"

nova/nova/virt/libvirt/volume.py
LibvirtISCSIVolumeDriver._disconnect_volume_multipath_iscsi

 618 out = self._run_iscsiadm_discover(iscsi_properties)
 619
 620 # Extract targets for the current multipath device.
 621 ips_iqns = []
 622 entries = self._get_iscsi_devices()
 623 for ip, iqn in self._get_target_portals_from_iscsiadm_output(out):
 624    ip_iqn = "%s-iscsi-%s" % (ip.split(",")[0], iqn)
 625    for entry in entries:
 626        entry_ip_iqn = entry.split("-lun-")[0]
 627        if entry_ip_iqn[:3] == "ip-":
 628            entry_ip_iqn = entry_ip_iqn[3:]
 629        elif entry_ip_iqn[:4] == "pci-":
 630            # Look at an offset of len('pci-0000:00:00.0')
 631            offset = entry_ip_iqn.find("ip-", 16, 21)
 632            entry_ip_iqn = entry_ip_iqn[(offset + 3):]
 633        if (ip_iqn != entry_ip_iqn):
 634            continue
 635        entry_real_path = os.path.realpath("/dev/disk/by-path/%s" %
 636                                           entry)
 637        entry_mpdev = self._get_multipath_device_name(entry_real_path)
 638        if entry_mpdev == multipath_device:
 639            ips_iqns.append([ip, iqn])
 640            break
 641
 642 if not devices:
 643     # disconnect if no other multipath devices
 644     self._disconnect_mpath(iscsi_properties, ips_iqns)
 645     return
 646
 647 # Get a target for all other multipath devices
 648 other_iqns = [self._get_multipath_iqn(device)
 649               for device in devices]

====================Code version =====================
stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
commit f4504f3575b35ec14390b4b678e441fcf953f47b
Merge: 3f21f60 5fbd852
Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
Date: Tue May 12 22:46:43 2015 +0000

    Merge "Remove db layer hard-code permission checks for
network_get_all_by_host"

** Affects: nova
     Importance: Undecided
         Status: New

** Description changed:

- iSCSI multipath has performance issue on volume detachment when multiple LUNs are exposed via single target(iqn).
- 1. We am using VNX as cinder backends. VNX is exposing multiple LUNs via a iqn. And a LUN is exposed via different iqns for multipathing. Libvirt driver is used in nova. And the virt_type is kvm.
+ iSCSI multipath has performance issue on volume detachment when multiple
+ LUNs are exposed via single target(iqn).
+ 
+ 1. We are using VNX as cinder backends. VNX is exposing multiple LUNs
+ via a iqn. And a LUN is exposed via different iqns for multipathing.
+ Libvirt driver is used in nova. And the virt_type is kvm.
  
  2. After we attached 100 volumes to VMs, and then do volume detachment
  in batch, we noticed that thousands of "multipath -ll <mp_id>" are
  executed per a volume detachement. In out enviornment, a "multipath -ll
  <mp_id>" takes about 0.2s, the performance is bad.
  
  3. Why there are so many "multipath -ll <mp-id>" triggerred?
- In order to find all pathes of a multipath device, the code went through all the devices under /dev/disk/by-path which used the same iqn and execute ‘multipath –ll’ on each of them to get the multipath id. When the multipath id of a device is the same as the volume to be detached. Then it is a path of the volume. When each iqn only expose one LUN, this code do not expose performance issue. However, when multiple luns are expose via a single iqn, the problems comes out. 
+ In order to find all pathes of a multipath device, the code went through all the devices under /dev/disk/by-path which used the same iqn and execute ‘multipath –ll’ on each of them to get the multipath id. When the multipath id of a device is the same as the volume to be detached. Then it is a path of the volume. When each iqn only expose one LUN, this code do not expose performance issue. However, when multiple luns are expose via a single iqn, the problems comes out.
  
  Assuming taht we have n LUNs attached. Each LUN has m iqns for multipathing, then there will be m*n devices under /dev/disk/by-path. And they are sharing m iqns. Then,
-     --   Code line 623- 644 will trigger o(n*m) times of "multipath -ll <mp-id>" 
-     --   Code line 648-649 will trigger o(!m) times of "multipath -ll <mp-id>"
+     -- Code line 623- 644 will trigger o(n*m) times of "multipath -ll <mp-id>"
+     -- Code line 648-649 will trigger o(!m) times of "multipath -ll <mp-id>"
  
  nova/nova/virt/libvirt/volume.py
  LibvirtISCSIVolumeDriver._disconnect_volume_multipath_iscsi
  
-  618         out = self._run_iscsiadm_discover(iscsi_properties)
+  618 out = self._run_iscsiadm_discover(iscsi_properties)
   619
-  620         # Extract targets for the current multipath device.
-  621         ips_iqns = []
-  622         entries = self._get_iscsi_devices()
-  623         for ip, iqn in self._get_target_portals_from_iscsiadm_output(out):
-  624             ip_iqn = "%s-iscsi-%s" % (ip.split(",")[0], iqn)
-  625             for entry in entries:
-  626                 entry_ip_iqn = entry.split("-lun-")[0]
-  627                 if entry_ip_iqn[:3] == "ip-":
-  628                     entry_ip_iqn = entry_ip_iqn[3:]
-  629                 elif entry_ip_iqn[:4] == "pci-":
-  630                     # Look at an offset of len('pci-0000:00:00.0')
-  631                     offset = entry_ip_iqn.find("ip-", 16, 21)
-  632                     entry_ip_iqn = entry_ip_iqn[(offset + 3):]
-  633                 if (ip_iqn != entry_ip_iqn):
-  634                     continue
-  635                 entry_real_path = os.path.realpath("/dev/disk/by-path/%s" %
-  636                                                    entry)
-  637                 entry_mpdev = self._get_multipath_device_name(entry_real_path)
-  638                 if entry_mpdev == multipath_device:
-  639                     ips_iqns.append([ip, iqn])
-  640                     break
+  620 # Extract targets for the current multipath device.
+  621 ips_iqns = []
+  622 entries = self._get_iscsi_devices()
+  623 for ip, iqn in self._get_target_portals_from_iscsiadm_output(out):
+  624 ip_iqn = "%s-iscsi-%s" % (ip.split(",")[0], iqn)
+  625 for entry in entries:
+  626 entry_ip_iqn = entry.split("-lun-")[0]
+  627 if entry_ip_iqn[:3] == "ip-":
+  628 entry_ip_iqn = entry_ip_iqn[3:]
+  629 elif entry_ip_iqn[:4] == "pci-":
+  630 # Look at an offset of len('pci-0000:00:00.0')
+  631 offset = entry_ip_iqn.find("ip-", 16, 21)
+  632 entry_ip_iqn = entry_ip_iqn[(offset + 3):]
+  633 if (ip_iqn != entry_ip_iqn):
+  634 continue
+  635 entry_real_path = os.path.realpath("/dev/disk/by-path/%s" %
+  636 entry)
+  637 entry_mpdev = self._get_multipath_device_name(entry_real_path)
+  638 if entry_mpdev == multipath_device:
+  639 ips_iqns.append([ip, iqn])
+  640 break
   641
-  642         if not devices:
-  643             # disconnect if no other multipath devices
-  644             self._disconnect_mpath(iscsi_properties, ips_iqns)
-  645             return
+  642 if not devices:
+  643 # disconnect if no other multipath devices
+  644 self._disconnect_mpath(iscsi_properties, ips_iqns)
+  645 return
   646
-  647         # Get a target for all other multipath devices
-  648         other_iqns = [self._get_multipath_iqn(device)
-  649                       for device in devices]
+  647 # Get a target for all other multipath devices
+  648 other_iqns = [self._get_multipath_iqn(device)
+  649 for device in devices]
  
- 
-  
  ====================Code version =====================
  stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
  commit f4504f3575b35ec14390b4b678e441fcf953f47b
  Merge: 3f21f60 5fbd852
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
- Date:   Tue May 12 22:46:43 2015 +0000
+ Date: Tue May 12 22:46:43 2015 +0000
  
      Merge "Remove db layer hard-code permission checks for
  network_get_all_by_host"

** Description changed:

  iSCSI multipath has performance issue on volume detachment when multiple
  LUNs are exposed via single target(iqn).
  
  1. We are using VNX as cinder backends. VNX is exposing multiple LUNs
  via a iqn. And a LUN is exposed via different iqns for multipathing.
  Libvirt driver is used in nova. And the virt_type is kvm.
  
  2. After we attached 100 volumes to VMs, and then do volume detachment
  in batch, we noticed that thousands of "multipath -ll <mp_id>" are
  executed per a volume detachement. In out enviornment, a "multipath -ll
  <mp_id>" takes about 0.2s, the performance is bad.
  
  3. Why there are so many "multipath -ll <mp-id>" triggerred?
  In order to find all pathes of a multipath device, the code went through all the devices under /dev/disk/by-path which used the same iqn and execute ‘multipath –ll’ on each of them to get the multipath id. When the multipath id of a device is the same as the volume to be detached. Then it is a path of the volume. When each iqn only expose one LUN, this code do not expose performance issue. However, when multiple luns are expose via a single iqn, the problems comes out.
  
  Assuming taht we have n LUNs attached. Each LUN has m iqns for multipathing, then there will be m*n devices under /dev/disk/by-path. And they are sharing m iqns. Then,
      -- Code line 623- 644 will trigger o(n*m) times of "multipath -ll <mp-id>"
      -- Code line 648-649 will trigger o(!m) times of "multipath -ll <mp-id>"
  
  nova/nova/virt/libvirt/volume.py
  LibvirtISCSIVolumeDriver._disconnect_volume_multipath_iscsi
  
   618 out = self._run_iscsiadm_discover(iscsi_properties)
   619
   620 # Extract targets for the current multipath device.
   621 ips_iqns = []
   622 entries = self._get_iscsi_devices()
   623 for ip, iqn in self._get_target_portals_from_iscsiadm_output(out):
-  624 ip_iqn = "%s-iscsi-%s" % (ip.split(",")[0], iqn)
-  625 for entry in entries:
-  626 entry_ip_iqn = entry.split("-lun-")[0]
-  627 if entry_ip_iqn[:3] == "ip-":
-  628 entry_ip_iqn = entry_ip_iqn[3:]
-  629 elif entry_ip_iqn[:4] == "pci-":
-  630 # Look at an offset of len('pci-0000:00:00.0')
-  631 offset = entry_ip_iqn.find("ip-", 16, 21)
-  632 entry_ip_iqn = entry_ip_iqn[(offset + 3):]
-  633 if (ip_iqn != entry_ip_iqn):
-  634 continue
-  635 entry_real_path = os.path.realpath("/dev/disk/by-path/%s" %
-  636 entry)
-  637 entry_mpdev = self._get_multipath_device_name(entry_real_path)
-  638 if entry_mpdev == multipath_device:
-  639 ips_iqns.append([ip, iqn])
-  640 break
+  624    ip_iqn = "%s-iscsi-%s" % (ip.split(",")[0], iqn)
+  625    for entry in entries:
+  626        entry_ip_iqn = entry.split("-lun-")[0]
+  627        if entry_ip_iqn[:3] == "ip-":
+  628            entry_ip_iqn = entry_ip_iqn[3:]
+  629        elif entry_ip_iqn[:4] == "pci-":
+  630            # Look at an offset of len('pci-0000:00:00.0')
+  631            offset = entry_ip_iqn.find("ip-", 16, 21)
+  632            entry_ip_iqn = entry_ip_iqn[(offset + 3):]
+  633        if (ip_iqn != entry_ip_iqn):
+  634            continue
+  635        entry_real_path = os.path.realpath("/dev/disk/by-path/%s" %
+  636                                           entry)
+  637        entry_mpdev = self._get_multipath_device_name(entry_real_path)
+  638        if entry_mpdev == multipath_device:
+  639            ips_iqns.append([ip, iqn])
+  640            break
   641
   642 if not devices:
-  643 # disconnect if no other multipath devices
-  644 self._disconnect_mpath(iscsi_properties, ips_iqns)
-  645 return
+  643     # disconnect if no other multipath devices
+  644     self._disconnect_mpath(iscsi_properties, ips_iqns)
+  645     return
   646
   647 # Get a target for all other multipath devices
   648 other_iqns = [self._get_multipath_iqn(device)
-  649 for device in devices]
+  649               for device in devices]
  
  ====================Code version =====================
  stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
  commit f4504f3575b35ec14390b4b678e441fcf953f47b
  Merge: 3f21f60 5fbd852
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
  Date: Tue May 12 22:46:43 2015 +0000
  
      Merge "Remove db layer hard-code permission checks for
  network_get_all_by_host"

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1454978

Title:
  [iSCSI Multipath]Thousands of multipath -ll <mp-id > are executed
  during volume detachment when multiple LUNs are exposed on a same
  target

Status in OpenStack Compute (Nova):
  New

Bug description:
  iSCSI multipath has performance issue on volume detachment when
  multiple LUNs are exposed via single target(iqn).

  1. We are using VNX as cinder backends. VNX is exposing multiple LUNs
  via a iqn. And a LUN is exposed via different iqns for multipathing.
  Libvirt driver is used in nova. And the virt_type is kvm.

  2. After we attached 100 volumes to VMs, and then do volume detachment
  in batch, we noticed that thousands of "multipath -ll <mp_id>" are
  executed per a volume detachement. In out enviornment, a "multipath
  -ll <mp_id>" takes about 0.2s, the performance is bad.

  3. Why there are so many "multipath -ll <mp-id>" triggerred?
  In order to find all pathes of a multipath device, the code went through all the devices under /dev/disk/by-path which used the same iqn and execute ‘multipath –ll’ on each of them to get the multipath id. When the multipath id of a device is the same as the volume to be detached. Then it is a path of the volume. When each iqn only expose one LUN, this code do not expose performance issue. However, when multiple luns are expose via a single iqn, the problems comes out.

  Assuming taht we have n LUNs attached. Each LUN has m iqns for multipathing, then there will be m*n devices under /dev/disk/by-path. And they are sharing m iqns. Then,
      -- Code line 623- 644 will trigger o(n*m) times of "multipath -ll <mp-id>"
      -- Code line 648-649 will trigger o(!m) times of "multipath -ll <mp-id>"

  nova/nova/virt/libvirt/volume.py
  LibvirtISCSIVolumeDriver._disconnect_volume_multipath_iscsi

   618 out = self._run_iscsiadm_discover(iscsi_properties)
   619
   620 # Extract targets for the current multipath device.
   621 ips_iqns = []
   622 entries = self._get_iscsi_devices()
   623 for ip, iqn in self._get_target_portals_from_iscsiadm_output(out):
   624    ip_iqn = "%s-iscsi-%s" % (ip.split(",")[0], iqn)
   625    for entry in entries:
   626        entry_ip_iqn = entry.split("-lun-")[0]
   627        if entry_ip_iqn[:3] == "ip-":
   628            entry_ip_iqn = entry_ip_iqn[3:]
   629        elif entry_ip_iqn[:4] == "pci-":
   630            # Look at an offset of len('pci-0000:00:00.0')
   631            offset = entry_ip_iqn.find("ip-", 16, 21)
   632            entry_ip_iqn = entry_ip_iqn[(offset + 3):]
   633        if (ip_iqn != entry_ip_iqn):
   634            continue
   635        entry_real_path = os.path.realpath("/dev/disk/by-path/%s" %
   636                                           entry)
   637        entry_mpdev = self._get_multipath_device_name(entry_real_path)
   638        if entry_mpdev == multipath_device:
   639            ips_iqns.append([ip, iqn])
   640            break
   641
   642 if not devices:
   643     # disconnect if no other multipath devices
   644     self._disconnect_mpath(iscsi_properties, ips_iqns)
   645     return
   646
   647 # Get a target for all other multipath devices
   648 other_iqns = [self._get_multipath_iqn(device)
   649               for device in devices]

  ====================Code version =====================
  stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
  commit f4504f3575b35ec14390b4b678e441fcf953f47b
  Merge: 3f21f60 5fbd852
  Author: Jenkins <jenkins@xxxxxxxxxxxxxxxxxxxx>
  Date: Tue May 12 22:46:43 2015 +0000

      Merge "Remove db layer hard-code permission checks for
  network_get_all_by_host"

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1454978/+subscriptions


Follow ups

References