group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #10890
[Bug 1648561] Re: HTX (htxubuntu) DASD exercisers fail
If os-prober is not running, I don't see what else might be poking the
disks to cause the warnings. We must ignore the other warning messages
that were listed from syslog earlier and look at it a different way. Is
the Redhat system showing the same write errors?
Is HTX doing raw writes to disk or are you writing files on a
filesystem? Again, what filesystems exist on the disks being exercised?
Could it be that the filesystem or the disk fails as a consequence of
the HTX workload?
Reassigning to 'linux' so that further investigation can be done on the
issue.
** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu)
Status: New => Incomplete
** Changed in: linux (Ubuntu Xenial)
Status: New => Incomplete
** Changed in: linux (Ubuntu)
Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)
** Changed in: os-prober (Ubuntu)
Milestone: ubuntu-17.02 => None
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1648561
Title:
HTX (htxubuntu) DASD exercisers fail
Status in linux package in Ubuntu:
Incomplete
Status in os-prober package in Ubuntu:
Won't Fix
Status in linux source package in Xenial:
Incomplete
Status in os-prober source package in Xenial:
Won't Fix
Bug description:
== Comment: #1 - Application Cdeadmin <cdeadmin@xxxxxxxxxx> - 2016-12-02 04:55:07 ==
==== State: Open by: tdylla on 01 December 2016 07:24:33 ====
Notice: This Note entry was modified. 2 non-ascii character(s) were
replaced with question marks.
BMC yl13u2bmc
OS yl13u2os
root@YL13U2OS:~# ver
cat: /proc/device-tree/openprom/model: No such file or directory
ver 1.5.4.5 - OS, HTX, Firmware and Machine details
OS: GNU/Linux
OS Version: Ubuntu 16.04.1 LTS \n \l
Kernel Version: 4.4.0-47-generic
HTX Version: htxubuntu-422
Host Name: YL13U2OS
Machine Serial No: 100CC9A
Machine Type/Model: 8335-GTB
root@YL13U2OS:~# uname -a
Linux YL13U2OS 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:38:24 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
root@YL13U2OS:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Dasd exercisers fail with a write error. These have never failed
before.
root@YL13U2OS:~# lsblk -o KNAME,TYPE,SIZE,MODEL,ROTA
KNAME TYPE SIZE MODEL ROTA
sda disk 1.8T ST2000NX0253 1
sda1 part 1.8T 1
sdb disk 1.8T ST2000NX0253 1
sdb1 part 1.8T 1
Getting HTX erros from yl13u2os.rch.stglabs.ibm.com
######################## Result Starts Here ################################
Currently running ECG/MDT : /usr/lpp/htx//mdt/mdt.whit
===========================
---------------------------------------------------------------------
Device id:/dev/sda1
Timestamp:Dec 1 01:22:57 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:rule_1_3 numopers= 1907729 loop= 1322123 blk=0xc08768b0 len=262144 dir=DOWN min_blkno=0xaea86084 max_blkno=0xe8e080af
BWRC LBA fencepost Detail:
th_num min_lba max_lba status
0 0 2476e9ff R
1 4766ee58 74704057 R
2 74704058 99783457 R
3 c0876ab0 e8e080af R
write error - errno: 1(?)
---------------------------------------------------------------------
---------------------------------------------------------------------
Device id:/dev/sda1
Timestamp:Dec 1 01:22:57 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:Hardware Exerciser stopped on error
---------------------------------------------------------------------
---------------------------------------------------------------------
Device id:/dev/sdb1
Timestamp:Dec 1 01:23:08 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:rule_1_1 numopers= 1907729 loop= 1394165 blk=0x49e45458 len=262144 dir=DOWN min_blkno=0x3a38202c max_blkno=0x74704057
BWRC LBA fencepost Detail:
th_num min_lba max_lba status
0 0 247c47ff R
1 49e45658 74704057 R
2 74704058 99d2a657 R
3 c0d344b0 e8e080af R
write error - errno: 1(?)
---------------------------------------------------------------------
---------------------------------------------------------------------
Device id:/dev/sdb1
Timestamp:Dec 1 01:23:08 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:Hardware Exerciser stopped on error
---------------------------------------------------------------------
######################### Result Ends Here
#################################
System is still running exercisers. Feel Free to play with the
system. System is available for any debug that is needed.
==== State: Open by: mamukul1 on 01 December 2016 15:41:32 ====
Write() failing with errno 1 for both sda1 and sdb1.
Some errors seen in dmesg as well in same timeframe.
Over to hxestorage to debug further.
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#
<Note by preeti, 2016/12/01 23:47:34 seq: 7 rel: 0 action: note>
Both the devices are failing with errno. set to 1 for write() system call,
which means "operation not permitted".
---------------------------------------------------------------------
Device id:/dev/sda1
Timestamp:Dec 1 01:22:57 2016
err=00000001
sev=1
Exerciser Name:hxestorage
Serial No:Not Available
Part No:Not Available
Location:Not Available
FRU Number:Not Available
Device:Not Available
Error Text:rule_1_3 numopers= 1907729 loop= 1322123 blk=0xc08768b0 len=262144 dir=DOWN min_blkno=0xaea86084 max_blkno=0xe8e080af
BWRC LBA fencepost Detail:
th_num min_lba max_lba status
0 0 2476e9ff R
1 4766ee58 74704057 R
2 74704058 99783457 R
3 c0876ab0 ) e8e080af R
write error - errno: 1(??
Below is corresponding data in kernel logs (Not sure if it is related
to error):
Dec 1 01:22:57 YL13U2OS kernel: [50119.193567] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
Dec 1 01:22:57 YL13U2OS kernel: [50119.201895] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
Dec 1 01:22:57 YL13U2OS kernel: [50119.207728] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
Dec 1 01:22:57 YL13U2OS kernel: [50119.234961] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sda1
Dec 1 01:22:57 YL13U2OS kernel: [50119.249926] FAT-fs (sda1): bogus number of FAT structure
Dec 1 01:22:57 YL13U2OS kernel: [50119.250215] FAT-fs (sda1): Can't find a valid FAT filesystem
Dec 1 01:22:58 YL13U2OS kernel: [50119.700556] XFS (sda1): Invalid superblock magic number
Dec 1 01:22:58 YL13U2OS kernel: [50120.448485] FAT-fs (sda1): bogus number of FAT structure
Dec 1 01:22:58 YL13U2OS kernel: [50120.448818] FAT-fs (sda1): Can't find a valid FAT filesystem
Dec 1 01:22:59 YL13U2OS kernel: [50120.463705] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sda1.
Dec 1 01:22:59 YL13U2OS kernel: [50120.468236] hfsplus: unable to find HFS+ superblock
Dec 1 01:22:59 YL13U2OS kernel: [50120.474019] qnx4: no qnx4 filesystem (no root dir).
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] ufs: You didn't specify the type of your ufs filesystem
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931]
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-cd|openstep ...
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931]
Dec 1 01:22:59 YL13U2OS kernel: [50120.477931] >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
Dec 1 01:22:59 YL13U2OS kernel: [50120.481654] ufs: ufs_fill_super(): bad magic number
Dec 1 01:22:59 YL13U2OS kernel: [50120.487379] hfs: can't find a HFS filesystem on dev sda1
Will transfer to Linux to look further.
<Note by preeti, 2016/12/02 04:35:35 seq: 8 rel: 0 action: assign>
== Comment: #2 - Application Cdeadmin <cdeadmin@xxxxxxxxxx> - 2016-12-02 09:55:08 ==
==== State: Open by: tdylla on 02 December 2016 09:53:18 ====
I noticed on a different system that has htxubuntu-424 installed along with a patch from defect sw372840 that the sdb exercisers is running just fine. It currently has a cycle count of 2 and current stanza of 5. The device on this other system is exactly the same drive type.
sdb disk 1.8T ST2000NX0253
sdb1 part 1.8T
== Comment: #3 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-05 05:43:45 ==
root@YL13U2OS:~# cat /proc/partitions
major minor #blocks name
1 0 65536 ram0
1 1 65536 ram1
1 2 65536 ram2
1 3 65536 ram3
1 4 65536 ram4
1 5 65536 ram5
1 6 65536 ram6
1 7 65536 ram7
1 8 65536 ram8
1 9 65536 ram9
1 10 65536 ram10
1 11 65536 ram11
1 12 65536 ram12
1 13 65536 ram13
1 14 65536 ram14
1 15 65536 ram15
259 0 3125616984 nvme0n1
259 1 7168 nvme0n1p1
259 2 2999266304 nvme0n1p2
259 3 126342144 nvme0n1p3
8 0 1953514584 sda
8 1 1953513560 sda1
8 16 1953514584 sdb
8 17 1953513560 sdb1
11 0 1048575 sr0
11 1 1048575 sr1
11 2 1048575 sr2
11 3 1048575 sr3
root@YL13U2OS:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=508856128k,nr_inodes=7950877,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=107151232k,mode=755)
/dev/nvme0n1p2 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=107151232k,mode=700)
root@YL13U2OS:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 508856128 0 508856128 0% /dev
tmpfs 107151232 32832 107118400 1% /run
/dev/nvme0n1p2 2952071944 7906084 2794186164 1% /
tmpfs 535756096 0 535756096 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 535756096 0 535756096 0% /sys/fs/cgroup
tmpfs 107151232 0 107151232 0% /run/user/0
root@YL13U2OS:~#
== Comment: #7 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-06 05:43:06 ==
root@YL13U2OS:~# df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
udev devtmpfs 508856128 0 508856128 0% /dev
tmpfs tmpfs 107151232 32832 107118400 1% /run
/dev/nvme0n1p2 ext4 2952071944 7931124 2794161124 1% /
tmpfs tmpfs 535756096 0 535756096 0% /dev/shm
tmpfs tmpfs 5120 0 5120 0% /run/lock
tmpfs tmpfs 535756096 0 535756096 0% /sys/fs/cgroup
tmpfs tmpfs 107151232 0 107151232 0% /run/user/0
root@YL13U2OS:~#
== Comment: #8 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-06 06:33:48 ==
root@YL13U2OS:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
# / was on /dev/nvme0n1p2 during installation
UUID=6cddb0e5-477c-4d64-807a-631b2d12dfac / ext4 errors=remount-ro 0 1
# swap was on /dev/nvme0n1p3 during installation
UUID=00693a84-74f6-4ded-b82d-6a938880ba8a none swap sw 0 0
root@YL13U2OS:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 1.8T 0 disk
??sda1 8:1 1 1.8T 0 part
sdb 8:16 1 1.8T 0 disk
??sdb1 8:17 1 1.8T 0 part
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom
sr2 11:2 1 1024M 0 rom
sr3 11:3 1 1024M 0 rom
nvme0n1 259:0 0 2.9T 0 disk
??nvme0n1p1 259:1 0 7M 0 part
??nvme0n1p2 259:2 0 2.8T 0 part /
??nvme0n1p3 259:3 0 120.5G 0 part [SWAP]
root@YL13U2OS:~# lsblk --fs
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
??sda1
sdb
??sdb1
sr0
sr1
sr2
sr3
nvme0n1
??nvme0n1p1
??nvme0n1p2 ext4 6cddb0e5-477c-4d64-807a-631b2d12dfac /
??nvme0n1p3 swap 00693a84-74f6-4ded-b82d-6a938880ba8a [SWAP]
root@YL13U2OS:~# grep -B 1 '"hxestorage"' /usr/lpp/htx/mdt/mdt
sda1:
HE_name = "hxestorage" * Hardware Exerciser name, 14 char
--
sdb1:
HE_name = "hxestorage" * Hardware Exerciser name, 14 char
root@YL13U2OS:~#
root@YL13U2OS:~#
root@YL13U2OS:~# grep 'Device id' /tmp/htxerr
Device id:/dev/sda1
Device id:/dev/sda1
Device id:/dev/sdb1
Device id:/dev/sdb1
root@YL13U2OS:~#
sda1 and sdb2 are only disks being exercised and both have errored out due after
write failure. nvme0n1p1 disk is being used by OS and thus not getting exercised by HTX.
== Comment: #9 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-06 07:52:38 ==
[Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:22:57 2016] EXT4-fs (sda1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:22:57 2016] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sda1
[Thu Dec 1 01:22:57 2016] FAT-fs (sda1): bogus number of FAT structure
[Thu Dec 1 01:22:57 2016] FAT-fs (sda1): Can't find a valid FAT filesystem
[Thu Dec 1 01:22:57 2016] XFS (sda1): Invalid superblock magic number
[Thu Dec 1 01:22:58 2016] FAT-fs (sda1): bogus number of FAT structure
[Thu Dec 1 01:22:58 2016] FAT-fs (sda1): Can't find a valid FAT filesystem
[Thu Dec 1 01:22:58 2016] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sda1.
[Thu Dec 1 01:22:58 2016] hfsplus: unable to find HFS+ superblock
[Thu Dec 1 01:22:58 2016] qnx4: no qnx4 filesystem (no root dir).
[Thu Dec 1 01:22:58 2016] ufs: You didn't specify the type of your ufs filesystem
mount -t ufs -o
ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-
cd|openstep ...
>>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
[Thu Dec 1 01:22:58 2016] ufs: ufs_fill_super(): bad magic number
[Thu Dec 1 01:22:58 2016] hfs: can't find a HFS filesystem on dev sda1
[Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:23:08 2016] EXT4-fs (sdb1): VFS: Can't find ext4 filesystem
[Thu Dec 1 01:23:08 2016] squashfs: SQUASHFS error: Can't find a SQUASHFS superblock on sdb1
[Thu Dec 1 01:23:08 2016] FAT-fs (sdb1): bogus number of FAT structure
[Thu Dec 1 01:23:08 2016] FAT-fs (sdb1): Can't find a valid FAT filesystem
[Thu Dec 1 01:23:08 2016] XFS (sdb1): Invalid superblock magic number
[Thu Dec 1 01:23:10 2016] FAT-fs (sdb1): bogus number of FAT structure
[Thu Dec 1 01:23:10 2016] FAT-fs (sdb1): Can't find a valid FAT filesystem
[Thu Dec 1 01:23:10 2016] VFS: Can't find a Minix filesystem V1 | V2 | V3 on device sdb1.
[Thu Dec 1 01:23:10 2016] hfsplus: unable to find HFS+ superblock
[Thu Dec 1 01:23:10 2016] qnx4: no qnx4 filesystem (no root dir).
[Thu Dec 1 01:23:10 2016] ufs: You didn't specify the type of your ufs filesystem
mount -t ufs -o
ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextstep|nextstep-
cd|openstep ...
>>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
[Thu Dec 1 01:23:10 2016] ufs: ufs_fill_super(): bad magic number
[Thu Dec 1 01:23:10 2016] hfs: can't find a HFS filesystem on dev sdb1
Linux has failed to detect file systems on sda1, sdb1 disks, causing write
failures for HTX exerciser. Similar fails are reported for nvme disk also in
Linux kernel log.
== Comment: #10 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-06 08:01:35 ==
Linux errors are being by os-prober. I ran os-probe manually and
FS fails got logged in Linux log. So os-probe got invoked while HTX
was running. This caused write fails for sda1, sdb1 disks along with
nvme disks and also logged Linux errors.
== Comment: #11 - VIPIN K. PARASHAR <viparash@xxxxxxxxxx> - 2016-12-06 08:04:55 ==
What operation was tried while HTX was running, once these errors
were seen ? Was it apt upgrade or some thing else ?
== Comment: #12 - Application Cdeadmin <cdeadmin@xxxxxxxxxx> - 2016-12-07 10:56:09 ==
==== State: MoreInfo by: tdylla on 07 December 2016 10:53:58 ====
HTX was started using htx command line commands. From then on, the
system was monitored through "System Live Monitor" No other commands
were executed by a user. This failure happened during an overnight
run. I believe that the Ubuntu OS was loaded to automatically load
Security Fix's which is required.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1648561/+subscriptions