← Back to team overview

kernel-packages team mailing list archive

[Bug 1354459] Re: kernel crash power 8 bare metal

 

We took an EEH error:

[   44.793204] pnv_pci_dump_phb_diag_data: Unrecognized ioType 33554432
[   44.793267] EEH: Frozen PE#5 detected on PHB#3
[   44.793318] CPU: 40 PID: 209 Comm: kworker/40:0 Not tainted 3.13.0-27-generic #50-Ubuntu
[   44.793396] Workqueue: events .work_for_cpu_fn
[   44.793458] Call Trace:
[   44.793487] [c000000fe6edb540] [c000000000016af0] .show_stack+0x170/0x290 (unreliable)
[   44.793575] [c000000fe6edb630] [c000000000966fc0] .dump_stack+0x88/0xb4
[   44.793651] [c000000fe6edb6b0] [c0000000000364b0] .eeh_dev_check_failure+0x430/0x480
[   44.793737] [c000000fe6edb760] [c000000000036584] .eeh_check_failure+0x84/0xe0
[   44.793827] [c000000fe6edb7f0] [d00000000eea33e0] .ipr_mask_and_clear_interrupts+0x190/0x1d0 [ipr]
[   44.793928] [c000000fe6edb8a0] [d00000000eeaa394] .ipr_probe_ioa+0xc24/0x1370 [ipr]
[   44.794017] [c000000fe6edb9d0] [d00000000eeb25c4] .ipr_probe+0x44/0x4c0 [ipr]
[   44.794093] [c000000fe6edbac0] [c000000000516cfc] .local_pci_probe+0x4c/0xe0
[   44.794167] [c000000fe6edbb40] [c0000000000bae68] .work_for_cpu_fn+0x38/0x60
[   44.794242] [c000000fe6edbbc0] [c0000000000bf628] .process_one_work+0x1a8/0x4d0
[   44.794327] [c000000fe6edbc60] [c0000000000c04fc] .worker_thread+0x38c/0x4a0
[   44.794401] [c000000fe6edbd30] [c0000000000c98a0] .kthread+0x110/0x130
[   44.794476] [c000000fe6edbe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
[   44.794572] EEH: Detected PCI bus error on PHB#3-PE#5
[   44.794632] EEH: This PCI device has failed 1 times in the last hour
[   44.794693] EEH: Notify device drivers to shutdown
[   44.794749] Unable to handle kernel paging request for data at address 0x00000008
[   44.794821] Faulting instruction address: 0xd00000000eea205c
[   44.794883] Oops: Kernel access of bad area, sig: 11 [#1]
[   44.794931] SMP NR_CPUS=2048 NUMA PowerNV
[   44.794982] Modules linked in: ipr(+)
[   44.795046] CPU: 9 PID: 810 Comm: eehd Not tainted 3.13.0-27-generic #50-Ubuntu
[   44.795120] task: c000000fdf7066f0 ti: c000000fe33a4000 task.ti: c000000fe33a4000
[   44.795192] NIP: d00000000eea205c LR: d00000000eea2a14 CTR: c00000000064f720
[   44.795264] REGS: c000000fe33a75b0 TRAP: 0300   Not tainted  (3.13.0-27-generic)
[   44.795336] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 229d0028  XER: 20000000
[   44.795641] CFAR: c000000000009318 DAR: 0000000000000008 DSISR: 40000000 SOFTE: 0
GPR00: d00000000eea2a14 c000000fe33a7830 d00000000eec4c58 c000000fda20cc60
GPR04: d00000000eebc178 0000000000000100 9000000100009033 ffffffffffffffff
GPR08: 0000000000000001 0000000000000000 0000000000000000 c00000000064f720
GPR12: d00000000eeb4978 c00000000fe41f80 c0000000000c9790 c000001fd8401600
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000bf9528
GPR24: c000000000bf9500 0000000000000000 d00000000eebc178 0000000000000100
GPR28: 0000000000000000 c000000fda20c538 0000000000000000 c000000fda20c538
[   44.797447] NIP [d00000000eea205c] .ipr_get_free_ipr_cmnd+0x2c/0x90 [ipr]
[   44.797567] LR [d00000000eea2a14] ._ipr_initiate_ioa_reset+0xe4/0x130 [ipr]
[   44.797683] Call Trace:
[   44.797736] [c000000fe33a78b0] [d00000000eea2a14] ._ipr_initiate_ioa_reset+0xe4/0x130 [ipr]
[   44.797900] [c000000fe33a7960] [d00000000eeab458] .ipr_pci_error_detected+0x1c8/0x230 [ipr]
[   44.798063] [c000000fe33a7a00] [c0000000000396bc] .eeh_report_error+0xac/0x120
[   44.798222] [c000000fe33a7a90] [c00000000003840c] .eeh_pe_dev_traverse+0x9c/0x170
[   44.798382] [c000000fe33a7b30] [c000000000039ce8] .eeh_handle_normal_event+0x128/0x3d0
[   44.798542] [c000000fe33a7bc0] [c000000000039fd8] .eeh_handle_event+0x48/0x2f0
[   44.798702] [c000000fe33a7c70] [c00000000003a39c] .eeh_event_handler+0x11c/0x1d0
[   44.798862] [c000000fe33a7d30] [c0000000000c98a0] .kthread+0x110/0x130
[   44.799000] [c000000fe33a7e30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
[   44.799158] Instruction dump:
[   44.799225] 60420000 7c0802a6 fbe1fff8 f8010010 f821ff81 7c7f1b78 48000008 e8410028
[   44.799453] 7fe3fb78 e9230729 7fa91840 41de0058 <e8e90008> e8c90000 3d000010 3d400020
[   44.799678] ---[ end trace 7439fee11bbab045 ]---

This is usually a sign of bad hardware. EEH was not ported for 3.13 but
should work on 3.16.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1354459

Title:
  kernel crash power 8 bare metal

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  I'm seeing crashes on power8 bare metal (powerNV).
  happens sometimes.

  
  [   66.852889] Workqueue: events .work_for_cpu_fn
  [   66.852950] Call Trace:
  [   66.852977] [c000000fe6edae60] [c000000000016af0] .show_stack+0x170/0x290 (unreliable)
  [   66.853063] [c000000fe6edaf50] [c000000000966fc0] .dump_stack+0x88/0xb4
  [   66.853138] [c000000fe6edafd0] [c000000000111680] .rcu_check_callbacks+0x5b0/0x950
  [   66.853225] [c000000fe6edb100] [c0000000000ad2f8] .update_process_times+0x58/0xb0
  [   66.853311] [c000000fe6edb190] [c00000000011f890] .tick_sched_handle.isra.17+0x40/0xd0
  [   66.853397] [c000000fe6edb220] [c00000000011f984] .tick_sched_timer+0x64/0xa0
  [   66.853472] [c000000fe6edb2c0] [c0000000000cda50] .__run_hrtimer+0xa0/0x270
  [   66.853546] [c000000fe6edb360] [c0000000000ce948] .hrtimer_interrupt+0x148/0x330
  [   66.853633] [c000000fe6edb470] [c000000000020a00] .timer_interrupt+0x120/0x2c0
  [   66.853718] [c000000fe6edb520] [c0000000000023d8] decrementer_common+0x158/0x180
  [   66.853805] --- Exception: 901 at ._raw_spin_lock_irqsave+0xb0/0x110
  [   66.853805]     LR = ._raw_spin_lock_irqsave+0xe8/0x110
  [   66.853916] [c000000fe6edb810] [c0000000000f0cb4] .finish_wait+0x74/0xb0 (unreliable)
  [   66.854008] [c000000fe6edb8a0] [d00000000eeaa4f4] .ipr_probe_ioa+0xd84/0x1370 [ipr]
  [   66.854096] [c000000fe6edb9d0] [d00000000eeb25c4] .ipr_probe+0x44/0x4c0 [ipr]
  [   66.854171] [c000000fe6edbac0] [c000000000516cfc] .local_pci_probe+0x4c/0xe0
  [   66.854245] [c000000fe6edbb40] [c0000000000bae68] .work_for_cpu_fn+0x38/0x60
  [   66.854319] [c000000fe6edbbc0] [c0000000000bf628] .process_one_work+0x1a8/0x4d0
  [   66.854405] [c000000fe6edbc60] [c0000000000c04fc] .worker_thread+0x38c/0x4a0
  [   66.854479] [c000000fe6edbd30] [c0000000000c98a0] .kthread+0x110/0x130
  [   66.854553] [c000000fe6edbe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
  [   72.412653] BUG: soft lockup - CPU#40 stuck for 22s! [kworker/40:0:209]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1354459/+subscriptions


References