kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #180896
[Bug 1581034] Re: STC840.20:tuleta:tul516p01 panic after injecting Leaf EEH
https://lists.ubuntu.com/archives/kernel-team/2016-May/077658.html
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1581034
Title:
STC840.20:tuleta:tul516p01 panic after injecting Leaf EEH
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Xenial:
In Progress
Bug description:
Dear Canonical,
There is a bug on nvme device driver that causes EEH to be broken
during an event. This causes an OOPS on the nvme, and make the entire
machine unavailable. This is the trace that we see during this
problem:
[ 121.614394] Unable to handle kernel paging request for data at address 0x00000020
[ 121.614524] Faulting instruction address: 0xd00000000dfb5530
[ 121.614602] Oops: Kernel access of bad area, sig: 11 [#1]
[ 121.614654] SMP NR_CPUS=2048 NUMA pSeries
[ 121.614713] Modules linked in: rpadlpar_io rpaphp nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag pseries_rng rtc_generic binfmt_misc sunrpc autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel dm_round_robin ses enclosure lpfc mlx4_core scsi_transport_fc nvme ipr scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
[ 121.615390] CPU: 18 PID: 19973 Comm: hxecpu Not tainted 4.4.0-21-generic #37-Ubuntu
[ 121.615450] task: c000001fbb589370 ti: c000001fc7148000 task.ti: c000001fc7148000
[ 121.615478] NIP: d00000000dfb5530 LR: d00000000dfb5650 CTR: d00000000dfb5550
[ 121.615497] REGS: c000001fc714b700 TRAP: 0300 Not tainted (4.4.0-21-generic)
[ 121.615512] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 39090553 XER: a0000000
[ 121.615686] CFAR: c000000000008468 DAR: 0000000000000020 DSISR: 40000000 SOFTE: 1
GPR00: d00000000dfb5650 c000001fc714b980 d00000000dfc9178 c000001fdcd0e000
GPR04: c000001fc1599200 0000000000000000 0000000000000000 0000000000000001
GPR08: 0000000000000000 0000000000000000 0000000000000020 0000000000000005
GPR12: d00000000dfb5550 c00000000e7eab00 c000001ff348a938 0000000000000100
GPR16: c000001ff348a738 c000001ff348a538 0000001ff2500000 0000000000000000
GPR20: c000001fc714bc40 c000000000f89d00 c000001fc714bb70 0000000000000000
GPR24: 0000000000000001 c000000000548ae0 0000000000000020 c000001fdcd0e000
GPR28: 00000000000001ff c000001fc1599200 0000000000000000 c000001fc1599200
[ 121.616623] NIP [d00000000dfb5530] nvme_free_iod+0x100/0x120 [nvme]
[ 121.616701] LR [d00000000dfb5650] nvme_complete_rq+0x100/0x240 [nvme]
[ 121.616743] Call Trace:
[ 121.616782] [c000001fc714b980] [0000000000000908] 0x908 (unreliable)
[ 121.616851] [c000001fc714b9d0] [d00000000dfb5650] nvme_complete_rq+0x100/0x240 [nvme]
[ 121.616925] [c000001fc714ba50] [c00000000054860c] __blk_mq_complete_request+0xbc/0x1b0
[ 121.616990] [c000001fc714ba90] [c00000000054c540] bt_for_each+0x160/0x170
[ 121.617074] [c000001fc714bb00] [c00000000054d4e8] blk_mq_queue_tag_busy_iter+0x78/0x110
[ 121.617156] [c000001fc714bb50] [c000000000547358] blk_mq_rq_timer+0x48/0x140
[ 121.617226] [c000001fc714bb90] [c00000000014a13c] call_timer_fn+0x5c/0x1c0
[ 121.617296] [c000001fc714bc20] [c00000000014a5fc] run_timer_softirq+0x31c/0x3f0
[ 121.617370] [c000001fc714bcf0] [c0000000000beb78] __do_softirq+0x188/0x3e0
[ 121.617442] [c000001fc714bde0] [c0000000000bf048] irq_exit+0xc8/0x100
[ 121.617507] [c000001fc714be00] [c00000000001f954] timer_interrupt+0xa4/0xe0
[ 121.617562] [c000001fc714be30] [c000000000002714] decrementer_common+0x114/0x180
[ 121.617619] Instruction dump:
[ 121.617663] e8010010 eb41ffd0 eb61ffd8 eb81ffe0 7c0803a6 eba1ffe8 ebc1fff0 ebe1fff8
[ 121.617829] 4e800020 60000000 60000000 60420000 <7c88502a> e87b0110 7fc5f378 48008d95
This bug was already fixed upstream (version 4.5) , and these are the commit IDs that contain the fix:
* 646017a612e7 ("NVMe: Fix namespace removal deadlock")
* 69d9a99c258e ("NVMe: Move error handling to failed reset handler")
* a59e0f5795fe5 ("blk-mq: End unstarted requests on dying queue")
Backports for each of these patches are attached.
Please, apply to the 16.04 kernel.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1581034/+subscriptions