yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83457
[Bug 1793259] Re: sg_scan returns wrong HLU number if it's greater than 255
Reviewed: https://review.opendev.org/742784
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=fc6ca22bdb955137d97cb9bcfc84104426e53842
Submitter: Zuul
Branch: master
commit fc6ca22bdb955137d97cb9bcfc84104426e53842
Author: Sam Wan <sam.wan@xxxxxxx>
Date: Thu Jul 23 22:35:27 2020 -0400
Replace sg_scan with lsscsi to get '[H:C:T:L]'
The current get_device_info uses sg_scan to get device info but it only
returns HLU number lower than 255 due to bug#1793259. sg_scan was
designed for old days when 255 LUNs were enough. However we now have
requirement to support HLU number greater than 255. Since lsscsi doesn't
have the limit of 255, we should use lsscsi to get device info.
The 'device' of get_device_info can be of 2 types:
o /dev/disk/by-path/xxx, which is a symlink to /dev/sdX
o /dev/sdX
sg_scan can take any device name but lsscsi only show /dev/sdx names.
So if the device is a symlink, we use the device name it links to,
otherwise we use it directly.
Then get the device info '[H:C:T:L]' by comparing the device name with the
last column of lsscsi output
Also lsscsi doesn't require privilege.
Depends-on: https://review.opendev.org/743548
Change-Id: I867c972d9f712c0df4260ebc8211b786006ed7a2
Closes-bug: #1793259
** Changed in: os-brick
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1793259
Title:
sg_scan returns wrong HLU number if it's greater than 255
Status in OpenStack Compute (nova):
Confirmed
Status in os-brick:
Fix Released
Bug description:
The tempest test case
'test_volumes_extend.VolumesExtendAttachedTest.test_extend_attached_volume'
will fail if the volume created is assigned a HLU number greater than
255.
The cause is that os-brick uses 'sg_scan' to get device information(H:C:T:L)
======
92 def get_device_info(self, device):
93 (out, _err) = self._execute('sg_scan', device, run_as_root=True,
94 root_helper=self._root_helper)
95 dev_info = {'device': device, 'host': None,
96 'channel': None, 'id': None, 'lun': None}
97 if out:
98 line = out.strip()
99 line = line.replace(device + ": ", "")
100 info = line.split(" ")
101
102 for item in info:
103 if '=' in item:
104 pair = item.split('=')
105 dev_info[pair[0]] = pair[1]
106 elif 'scsi' in item:
107 dev_info['host'] = item.replace('scsi', '')
108
109 return dev_info
======
sg_scan uses 'ioctl SCSI_IOCTL_GET_IDLUN' to get this device information.
https://github.com/hreinecke/sg3_utils/blob/master/src/sg_scan_linux.c#L321
======
res = ioctl(sg_fd, SCSI_IOCTL_GET_IDLUN, &my_idlun);
...
printf("%s: scsi%d channel=%d id=%d lun=%d", file_namep, host_no,
(my_idlun.dev_id >> 16) & 0xff, my_idlun.dev_id & 0xff,
(my_idlun.dev_id >> 8) & 0xff); # <--- only 8-bit represents the device id.
======
however the device_id that sg_scan can return is only one byte which
means it can only return number lower than 255.
Below is an example.
======
here's a device
------
# multipath -ll 3600601602220440062449f5b82796e05
3600601602220440062449f5b82796e05 dm-6 DGC ,VRAID
size=1.0G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 6:0:0:15797 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 7:0:2:15797 sdn 8:208 active ready running
------
we can see that the device id is 15797.
this number can also be got using 'lsscsi'
------
# lsscsi |grep 15797
[6:0:0:15797] disk DGC VRAID 4400 /dev/sdm
[7:0:2:15797] disk DGC VRAID 4400 /dev/sdn
------
However sg_scan returned different device id
------
# sg_scan -i /dev/sdm
/dev/sdm: scsi6 channel=0 id=0 lun=181 [em]
DGC VRAID 4400 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
# sg_scan -i /dev/sdn
/dev/sdn: scsi7 channel=0 id=2 lun=181 [em]
DGC VRAID 4400 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
------
The device id returned by sg_scan is 181.
By doing some simple binary calculation, we can find out that 181 is the last 8-bit of 15797
------
15797 && 0xff = 0b11110110110101 && 0x11111111 = 0b10110101 = 181
-----
Since wrong device id is returned by sg_scan, when os-brick tries to
scan a non-exist device it will cause extend_attached_volume to fail.
We should use 'lsscsi' to get device id.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1793259/+subscriptions