← Back to team overview

kernel-packages team mailing list archive

[Bug 1581034] Re: STC840.20:tuleta:tul516p01 panic after injecting Leaf EEH


This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!

** Tags added: verification-needed-xenial

You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

  STC840.20:tuleta:tul516p01 panic after injecting Leaf EEH

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  Dear Canonical,

  There is a bug on nvme device driver that causes EEH to be broken
  during an event. This causes an OOPS on the nvme, and make the entire
  machine  unavailable. This is the trace that we see during this

          [  121.614394] Unable to handle kernel paging request for data at address 0x00000020
          [  121.614524] Faulting instruction address: 0xd00000000dfb5530
          [  121.614602] Oops: Kernel access of bad area, sig: 11 [#1]
          [  121.614654] SMP NR_CPUS=2048 NUMA pSeries
          [  121.614713] Modules linked in: rpadlpar_io rpaphp nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag pseries_rng rtc_generic binfmt_misc sunrpc autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel dm_round_robin ses enclosure lpfc mlx4_core scsi_transport_fc nvme ipr scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath
          [  121.615390] CPU: 18 PID: 19973 Comm: hxecpu Not tainted 4.4.0-21-generic #37-Ubuntu
          [  121.615450] task: c000001fbb589370 ti: c000001fc7148000 task.ti: c000001fc7148000
          [  121.615478] NIP: d00000000dfb5530 LR: d00000000dfb5650 CTR: d00000000dfb5550
          [  121.615497] REGS: c000001fc714b700 TRAP: 0300   Not tainted  (4.4.0-21-generic)
          [  121.615512] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 39090553  XER: a0000000
          [  121.615686] CFAR: c000000000008468 DAR: 0000000000000020 DSISR: 40000000 SOFTE: 1
          GPR00: d00000000dfb5650 c000001fc714b980 d00000000dfc9178 c000001fdcd0e000
          GPR04: c000001fc1599200 0000000000000000 0000000000000000 0000000000000001
          GPR08: 0000000000000000 0000000000000000 0000000000000020 0000000000000005
          GPR12: d00000000dfb5550 c00000000e7eab00 c000001ff348a938 0000000000000100
          GPR16: c000001ff348a738 c000001ff348a538 0000001ff2500000 0000000000000000
          GPR20: c000001fc714bc40 c000000000f89d00 c000001fc714bb70 0000000000000000
          GPR24: 0000000000000001 c000000000548ae0 0000000000000020 c000001fdcd0e000
          GPR28: 00000000000001ff c000001fc1599200 0000000000000000 c000001fc1599200
          [  121.616623] NIP [d00000000dfb5530] nvme_free_iod+0x100/0x120 [nvme]
          [  121.616701] LR [d00000000dfb5650] nvme_complete_rq+0x100/0x240 [nvme]
          [  121.616743] Call Trace:
          [  121.616782] [c000001fc714b980] [0000000000000908] 0x908 (unreliable)
          [  121.616851] [c000001fc714b9d0] [d00000000dfb5650] nvme_complete_rq+0x100/0x240 [nvme]
          [  121.616925] [c000001fc714ba50] [c00000000054860c] __blk_mq_complete_request+0xbc/0x1b0
          [  121.616990] [c000001fc714ba90] [c00000000054c540] bt_for_each+0x160/0x170
          [  121.617074] [c000001fc714bb00] [c00000000054d4e8] blk_mq_queue_tag_busy_iter+0x78/0x110
          [  121.617156] [c000001fc714bb50] [c000000000547358] blk_mq_rq_timer+0x48/0x140
          [  121.617226] [c000001fc714bb90] [c00000000014a13c] call_timer_fn+0x5c/0x1c0
          [  121.617296] [c000001fc714bc20] [c00000000014a5fc] run_timer_softirq+0x31c/0x3f0
          [  121.617370] [c000001fc714bcf0] [c0000000000beb78] __do_softirq+0x188/0x3e0
          [  121.617442] [c000001fc714bde0] [c0000000000bf048] irq_exit+0xc8/0x100
          [  121.617507] [c000001fc714be00] [c00000000001f954] timer_interrupt+0xa4/0xe0
          [  121.617562] [c000001fc714be30] [c000000000002714] decrementer_common+0x114/0x180 
          [  121.617619] Instruction dump:
          [  121.617663] e8010010 eb41ffd0 eb61ffd8 eb81ffe0 7c0803a6 eba1ffe8 ebc1fff0 ebe1fff8                                                   
          [  121.617829] 4e800020 60000000 60000000 60420000 <7c88502a> e87b0110 7fc5f378 48008d95

  This bug was already fixed upstream (version 4.5) , and these are the commit IDs that contain the fix:

   * 646017a612e7 ("NVMe: Fix namespace removal deadlock")
   * 69d9a99c258e ("NVMe: Move error handling to failed reset handler")
   * a59e0f5795fe5 ("blk-mq: End unstarted requests on dying queue")

  Backports for each of these patches are attached.

  Please, apply to the 16.04 kernel.

To manage notifications about this bug go to: