kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #142376
[Bug 1491494] Re: Ubuntu 14.04.03 LPAR hits kernel oops after serial adapter is removed from profile
------- Comment From cdeadmin@xxxxxxxxxx 2015-10-26 18:25 EDT-------
==== State: Assigned by: nguyenp on 26 October 2015 13:14:50 ====
Per sametime with Gabriel this morning,he's working and building a
workaround for to the problem.
I'm lowering the severity of the defect since it's not a blocker.
** Tags removed: severity-critical
** Tags added: severity-high
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1491494
Title:
Ubuntu 14.04.03 LPAR hits kernel oops after serial adapter is removed
from profile
Status in linux package in Ubuntu:
Triaged
Bug description:
-- Problem Description --
The failure related to the BELL-3 (2 port-Async EIA-232 adapter).
Ubuntu always hit exception when the adapter is not present. See my
test scenarios below.
Test #1: Boot Ubuntu with BELL-3 adapter
=======
- The Ubuntu LPAR was running with the BELL-3 (2 port-Async EIA-232 adapter) before. So I assigned the BELL-3 adapter to Ubuntu LPAR profile and powered on the LPAR.
=> Ubuntu boot fine this time.
Test #2: Boot Ubuntu with BELL-3 adapter removed from LPAR profile
=======
- I powered down the Ubuntu partition and removed the BELL-3 adapter from LPAR profile then powered on the LPAR.
=> Ubuntu hit the exception.
Elapsed time since release of system processors: 0 mins 9 secs
error: no suitable video mode found.
OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 3.19.0-23-generic (buildd@denneed03) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #24~14.04.1-Ubuntu SMP Wed Jul 8 11:17:19 UTC 2015 (Ubuntu 3.19.0-23.24~14.04.1-generic 3.19.8-ckt2)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/boot/vmlinux-3.19.0-23-generic root=UUID=768190e7-f633-4c63-a1e3-588d12dea265 ro quiet splash vt.handoff=7
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 000000000b420000
alloc_top : 0000000010000000
alloc_top_hi : 0000000010000000
rmo_top : 0000000010000000
ram_top : 0000000010000000
instantiating rtas at 0x000000000ecb0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000b430000 -> 0x000000000b4316b1
Device tree struct 0x000000000b440000 -> 0x000000000b470000
Calling quiesce...
returning from prom_init
-> smp_release_cpus()
spinning_secondaries = 15
<- smp_release_cpus()
<- setup_system()
[ 0.661510] /build/linux-lts-vivid-uV14Ja/linux-lts-vivid-3.19.0/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 0.672826] sd 0:0:1:0: [sda] Assuming drive cache: write through
[ 4.658302] device-mapper: table: 252:0: multipath: error getting device
[ 4.691990] device-mapper: table: 252:0: multipath: error getting device
[ 4.934034] device-mapper: table: 252:0: multipath: error getting device
[ 4.951977] device-mapper: table: 252:0: multipath: error getting device
* Discovering and coalescing multipaths... [ OK ]
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
* Starting AppArmor profiles [ OK ]
Loading the saved-state of the serial devices...
[ 5.109665] Unable to handle kernel paging request for data at address 0xd000080000000003
[ 5.109677] Faulting instruction address: 0xc00000000060fec4
[ 5.109685] Oops: Kernel access of bad area, sig: 11 [#1]
[ 5.109691] SMP NR_CPUS=2048 NUMA pSeries
[ 5.109699] Modules linked in: dm_round_robin dm_multipath scsi_dh pseries_rng rtc_generic knem(OE) nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE)
[ 5.109759] CPU: 1 PID: 1816 Comm: setserial Tainted: G OE 3.19.0-23-generic #24~14.04.1-Ubuntu
[ 5.109769] task: c0000000f389c880 ti: c0000000f0528000 task.ti: c0000000f0528000
[ 5.109777] NIP: c00000000060fec4 LR: c000000000617498 CTR: c00000000060fe20
[ 5.109785] REGS: c0000000f052b6b0 TRAP: 0300 Tainted: G OE (3.19.0-23-generic)
[ 5.109793] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84002022 XER: 00000000
[ 5.109814] CFAR: c000000000008468 DAR: d000080000000003 DSISR: 42000000 SOFTE: 1
GPR00: c000000000617498 c0000000f052b930 c00000000144c700 00000000000000bf
GPR04: d000080000000003 00000000000000bf c0000000f3990000 0000000000000141
GPR08: c000000000611d20 c0000000013539e0 d000080000000000 c000000001351ba8
GPR12: c00000000060fe20 c00000000e830900 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 000000000000007d 0000000000000040 0000000000000000 0000000000000000
GPR24: 0000000000000000 c0000000f53cbc00 0000000000000001 0000000000000000
GPR28: c0000000f53cbde0 00000000000000bf 0000000000000003 c000000001754970
[ 5.109916] NIP [c00000000060fec4] io_serial_out+0xa4/0xd0
[ 5.109924] LR [c000000000617498] serial8250_do_startup+0x978/0xe50
[ 5.109931] Call Trace:
[ 5.109936] [c0000000f052b930] [c0000000f052b970] 0xc0000000f052b970 (unreliable)
[ 5.109948] [c0000000f052b970] [c000000000617498] serial8250_do_startup+0x978/0xe50
[ 5.109958] [c0000000f052ba10] [c00000000060eb00] uart_startup.part.7+0xd0/0x310
[ 5.109967] [c0000000f052ba60] [c00000000060f1ac] uart_set_info+0x46c/0x580
[ 5.109976] [c0000000f052bb90] [c00000000060f378] uart_ioctl+0xb8/0x590
[ 5.109986] [c0000000f052bc40] [c0000000005dd89c] tty_ioctl+0x21c/0xf60
[ 5.109995] [c0000000f052bd40] [c0000000002ce680] do_vfs_ioctl+0x4f0/0x7c0
[ 5.110004] [c0000000f052bde0] [c0000000002cea24] SyS_ioctl+0xd4/0xf0
[ 5.110014] [c0000000f052be30] [c000000000009258] system_call+0x38/0xd0
[ 5.110021] Instruction dump:
[ 5.110026] 38210040 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 3d42fff0
[ 5.110040] 392a72e0 e9490000 7c845214 7c0004ac <98640000> 39200001 992d02bc 38210040
[ 5.110057] ---[ end trace 7c597ccc52ffb926 ]---
[ 5.114039]
3) Test 3: DLPAR removed the adapter first then reboot the LPAR
======
- I powered down the Ubuntu LPAR.
- I then assigned the BELL-3 adapter back in Ubuntu LPAR profile. Then powered the partition.
- It boot fine with no problem.
root@tul7p07:~# lspci
60:00.0 Serial controller: Digi International Device 00f6
0000:60:00.0 ttyS0 ttyS1 serial U78CB.001.WZS02NH-P1-C12-T1
serial (1410f600)
Manufacturer Name.........IBM
Machine Type-Model........Unknown
Device Specific.(YC)......0
Location Code.(YL)........U78CB.001.WZS02NH-P1-C12-T1
ttyS0 U78CB.001.WZS02NH-P1-C12-T1
Serial Device
Location Code.(YL)........U78CB.001.WZS02NH-P1-C12-T1
ttyS1 U78CB.001.WZS02NH-P1-C12-T1
Serial Device
Location Code.(YL)........U78CB.001.WZS02NH-P1-C12-T1
- I then went into HMC and performed the DLPAR remove adater this
time. The operation completed successfully.
- I then powered down and check LPAR profile (No more BELL-3 adapter
assigned).
- I then powered up the Ubuntu LPAR again. Still hit exception in this
case.
So Ubuntu always hit exception when the adapter is not present.
The system does show a config file originally created on Jul 30. The /etc/init.d/setserial is the startup service that attempts to configure the serial devices either using /etc/serial.conf (there isn't one) or /var/lib/setserial/autoserial.conf which does exist.
root@tul7p07:/etc/init.d# ls -l /var/lib/setserial/autoserial.conf
-rw-r--r-- 1 root root 518 Jul 30 00:27 /var/lib/setserial/autoserial.conf
root@tul7p07:/etc/init.d# ls /etc/serial.conf
ls: cannot access /etc/serial.conf: No such file or directory
root@tul7p07:/etc/init.d# cat /var/lib/setserial/autoserial.conf
###PORT STATE GENERATED USING AUTOSAVE-ONCE###
###AUTOSAVE-ONCE###
###AUTOSAVE-ONCE###
###AUTOSAVE###
#
# If you want to configure this file by hand, use
# dpkg-reconfigure setserial
# and change the configuration mode of the file to MANUAL. If you do not do this, this file may be overwritten automatically the next time you upgrade the
# package.
#
/dev/ttyS0 uart 16950/954 port 0x0000 irq 0 baud_base 4000000 spd_normal skip_test
/dev/ttyS1 uart 16950/954 port 0x0000 irq 0 baud_base 4000000 spd_normal skip_test
I am thinking that if you rename or mv the
/var/lib/setserial/autoserial.conf so it doesn't find it (or disable
the setserial service might work, too) it may just come up without the
adapter.
So, next step is to rename or move that conf file, shutdown the
partition, remove the digi adapter from the profile and see what
happens when we come back up. If it comes back up the question will
be, what should the OS if it has autosaved configuration info on the
ports and then the adapter is removed? Should the system ensure those
devices are still present before attempting to tell the kernel to
configure; should the kernel have more sanity checks?
Thanks to Luciano C. for pointed out the issues. I ran tests and
confirmed that what he pointed out is correct.
So now we need to address these questions from his previous comment:
- what should the OS if it has autosaved configuration info on the ports and then the adapter is removed?
- Should the system ensure those devices are still present before attempting to tell the kernel to configure;
- should the kernel have more sanity checks?
Here are the tests I ran:
========================
1) First, I booted Ubuntu with serial adapter.
root@tul7p07:~# lspci
60:00.0 Serial controller: Digi International Device 00f6
root@tul7p07:~#
2) Then I moved /var/lib/setserial/autoserial.conf to a different
name. (Per Luciano C. instruction).
root@tul7p07:~# mv /var/lib/setserial/autoserial.conf /var/lib/setserial/autoserial.conf.org
root@tul7p07:~# ls -l /var/lib/setserial/autoserial.conf*
-rw-r--r-- 1 root root 305 Jul 30 00:27 /var/lib/setserial/autoserial.conf.old
-rw-r--r-- 1 root root 518 Jul 30 00:27 /var/lib/setserial/autoserial.conf.org
3) I then Shutdowned the Ubuntu partition and removed serial adapter from partition's profile.
Then I boot it up again. The system came up to the login prompt.
=====
Ubuntu 14.04.3 LTS tul7p07.aus.stglabs.ibm.com hvc0
tul7p07 login: root
Password:
=================
4) I then added the serial adapter back in Ubuntu partition's profile and booted the partition up again.
====
Ubuntu 14.04.3 LTS tul7p07.aus.stglabs.ibm.com hvc0
tul7p07 login: root
Password:
Last login: Wed Sep 2 09:49:21 CDT 2015 on hvc0
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-27-generic ppc64le)
* Documentation: https://help.ubuntu.com/
root@tul7p07:~# lspci
60:00.0 Serial controller: Digi International Device 00f6
root@tul7p07:~#
====
5) I checked /var/lib/setserial/. It created a new autoserial.conf
like expected.
root@tul7p07:~# ls -l /var/lib/setserial/
total 12
-rw-r--r-- 1 root root 15 Sep 2 09:52 autoserial.conf
-rw-r--r-- 1 root root 15 Sep 2 09:47 autoserial.conf.old
-rw-r--r-- 1 root root 518 Jul 30 00:27 autoserial.conf.org
-rw-r--r-- 1 root root 0 Jul 30 00:27 etc.serial.conf.bkp
root@tul7p07:~#
6) I then shutdown Ubuntu partition without removed or renamed the
autoserial.conf file.
7) I removed the serial adapter from Ubuntu partition's profile and
booted the partion again. The kernel again tried to configured the
serial port memory address which is now a bogu address so it hit the
problem again.
==========
Preparing to boot Linux version 3.19.0-27-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #29~14.04.1-Ubuntu SMP Sun Aug 16 01:51:48 UTC 2015 (Ubuntu 3.19.0-27.29~14.04.1-generic 3.19.8-ckt5)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/boot/vmlinux-3.19.0-27-generic root=UUID=768190e7-f633-4c63-a1e3-588d12dea265 ro quiet splash vt.handoff=7
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 000000000b400000
alloc_top : 0000000010000000
alloc_top_hi : 0000000010000000
rmo_top : 0000000010000000
ram_top : 0000000010000000
instantiating rtas at 0x000000000ecb0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000b410000 -> 0x000000000b4116b1
Device tree struct 0x000000000b420000 -> 0x000000000b450000
Calling quiesce...
returning from prom_init
-> smp_release_cpus()
spinning_secondaries = 15
<- smp_release_cpus()
<- setup_system()
[ 0.643938] /build/linux-lts-vivid-4KQgBt/linux-lts-vivid-3.19.0/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 0.656156] sd 0:0:1:0: [sda] Assuming drive cache: write through
* Discovering and coalescing multipaths... [ OK ]
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
* Starting AppArmor profiles [ OK ]
Loading the saved-state of the serial devices...
[ 5.276868] Unable to handle kernel paging request for data at address 0xd000080000000003
[ 5.276880] Faulting instruction address: 0xc00000000060f684
[ 5.276888] Oops: Kernel access of bad area, sig: 11 [#1]
[ 5.276894] SMP NR_CPUS=2048 NUMA pSeries
[ 5.276902] Modules linked in: dm_multipath scsi_dh pseries_rng ib_ipoib rdma_ucm rtc_generic rdma_cm iw_cm ib_ucm ib_uverbs ib_cm ib_umad mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_en vxlan ip6_udp_tunnel udp_tunnel mlx4_core nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache [last unloaded: mlx5_core]
[ 5.276960] CPU: 8 PID: 1466 Comm: setserial Not tainted 3.19.0-27-generic #29~14.04.1-Ubuntu
[ 5.276969] task: c0000000f2065300 ti: c0000000f21c4000 task.ti: c0000000f21c4000
[ 5.276977] NIP: c00000000060f684 LR: c000000000616c58 CTR: c00000000060f5e0
[ 5.276985] REGS: c0000000f21c76b0 TRAP: 0300 Not tainted (3.19.0-27-generic)
[ 5.276992] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84002022 XER: 00000000
[ 5.277012] CFAR: c000000000008468 DAR: d000080000000003 DSISR: 42000000 SOFTE: 1
GPR00: c000000000616c58 c0000000f21c7930 c00000000144cc00 00000000000000bf
GPR04: d000080000000003 00000000000000bf c0000000f54b0000 0000000000000141
GPR08: c0000000006114e0 c0000000013539e0 d000080000000000 c000000001351ba8
GPR12: c00000000060f5e0 c00000000e834800 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 000000000000007d 0000000000000040 0000000000000000 0000000000000000
GPR24: 0000000000000000 c0000000f8092000 0000000000000001 0000000000000000
GPR28: c0000000f80921e0 00000000000000bf 0000000000000003 c0000000017549f0
[ 5.277115] NIP [c00000000060f684] io_serial_out+0xa4/0xd0
[ 5.277122] LR [c000000000616c58] serial8250_do_startup+0x978/0xe50
[ 5.277129] Call Trace:
[ 5.277134] [c0000000f21c7930] [c0000000f21c7970] 0xc0000000f21c7970 (unreliable)
[ 5.277145] [c0000000f21c7970] [c000000000616c58] serial8250_do_startup+0x978/0xe50
[ 5.277155] [c0000000f21c7a10] [c00000000060e2c0] uart_startup.part.7+0xd0/0x310
[ 5.277164] [c0000000f21c7a60] [c00000000060e96c] uart_set_info+0x46c/0x580
[ 5.277173] [c0000000f21c7b90] [c00000000060eb38] uart_ioctl+0xb8/0x590
[ 5.277183] [c0000000f21c7c40] [c0000000005dd01c] tty_ioctl+0x21c/0xf60
[ 5.277192] [c0000000f21c7d40] [c0000000002ce7a0] do_vfs_ioctl+0x4f0/0x7c0
[ 5.277201] [c0000000f21c7de0] [c0000000002ceb44] SyS_ioctl+0xd4/0xf0
[ 5.277210] [c0000000f21c7e30] [c000000000009258] system_call+0x38/0xd0
[ 5.277217] Instruction dump:
[ 5.277222] 38210040 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 3d42fff0
[ 5.277236] 392a6de0 e9490000 7c845214 7c0004ac <98640000> 39200001 992d02bc 38210040
[ 5.277252] ---[ end trace d5657031818c6b89 ]---
[ 5.280950]
[ 11.843975] init: openibd pre-start process (1614) terminated with status 3
=======================
Mirroring to Launchpad for Canonical folks to take a look...
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1491494/+subscriptions