kernel-packages team mailing list archive

Thread
Date
[Bug 1352056] Re: linux: kdump on Ubuntu 14.04 is not generating a dump.

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Chris J Arges <1352056@xxxxxxxxxxxxxxxxxx>
Date: Wed, 01 Oct 2014 20:32:54 -0000
Reply-to: Bug 1352056 <1352056@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Description changed:

  SRU Justification:
  
  [Impact]
  Users of ppc64el hardware need the ability to use crashdumps to do kernel debugging.
  
  [Fix]
  Commit upstream and already in utopic:
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867
  
  [Test Case]
  Taken from:
  https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
  https://help.ubuntu.com/14.04/serverguide/kernel-crash-dump.html
  
  1) apt-get install linux-crashdump
- 2) reboot the machine
- 3) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools
- 4) kdump-config show # should return no errors
- 5) echo 'c' | sudo tee /proc/sysrq-trigger
- 6) This should crash the machine and we should kexec into another kernel to dump the core, then on the next reboot we should see a crash in /var/crash/*
+ 2) increase crashdump size:
+ sudo vim /etc/default/grub.d/kexec-tools.cfg
+ set crashkernel=1024M
+ sudo update-grub
+ 3) reboot the machine
+ 4) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools
+ 5) kdump-config show # should return no errors
+ 6) echo 'c' | sudo tee /proc/sysrq-trigger
+ 7) This should crash the machine and we should kexec into another kernel to dump the core, then on the next reboot we should see a crash in /var/crash/*
  
  --
- 
  
  ---Problem Description---
  kdump is not producing a dump on powerKVM LE P8 Ubuntu 14.04
  
  ---uname output---
  3.13.0-30-generic
  
  ---Additional Hardware Info---
  Power8 LE configuration.
  
  ---Patches Installed---
  1324544 - kdump-config load fails with vmlinux kernel (vs. vmlinuz)
  
  Machine Type = 8247-22L
  
  ---Steps to Reproduce---
  Installed kdump-tools 1.5.5-2ubuntu1 and crash 7.0.3-3ubuntu3.
  Updated /etc/default/kdump-tools, first I updated just USE_KDUMP=1. Rebooted the node and see:
  root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash crashkernel=384M-:128M
  root@c656f2n02:~# cat /proc/sys/kernel/sysrq
  1
  root@c656f2n02:~# cat /proc/sys/kernel/sysrq
  1
  root@c656f2n02:~# ^Cnd /proc | grep sysrq
  root@c656f2n02:~# kdump-config status
  current state   : ready to kdump
  root@c656f2n02:~# kdump-config show
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr:
  current state:    ready to kdump
  
  kexec command:
    /sbin/kexec -p --args-linux --command-line="root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash  irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.13.0-30-generic /boot/vmlinux-3.13.0-30-generic
  
  root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_crash_loaded
  1
  root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_loaded
  0
  
  echo c > /proc/sysrq-trigger
  
  root@c656f2n02:/var/log# echo c > /proc/sysrq-trigger
  [ 1956.014243] SysRq : Trigger a crash
  [ 1956.014328] Unable to handle kernel paging request for data at address 0x00000000
  [ 1956.014404] Faulting instruction address: 0xc000000000586c2c
  [ 1956.014468] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 1956.014518] SMP NR_CPUS=2048 NUMA PowerNV
  [ 1956.014570] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables autofs4 rdma_ucm(OF) ib_ucm(OF) rdma_cm(OF) iw_cm(OF) ib_ipoib(OF) ib_cm(OF) ib_uverbs(OF) ib_umad(OF) mlx5_ib(OF) mlx5_core(OF) mlx4_ib(OF) ib_sa(OF) ib_mad(OF) ib_core(OF) ib_addr(OF) mlx4_en(OF) mlx4_core(OF) compat(OF) nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache rtc_generic powernv_rng ses enclosure ipr
  [ 1956.015306] CPU: 146 PID: 2522 Comm: bash Tainted: GF          O 3.13.0-30-generic #54-Ubuntu
  [ 1956.015394] task: c000003fcabda120 ti: c000003fcac58000 task.ti: c000003fcac58000
  [ 1956.015469] NIP: c000000000586c2c LR: c000000000587b8c CTR: c000000000586c00
  [ 1956.015543] REGS: c000003fcac5b820 TRAP: 0300   Tainted: GF          O  (3.13.0-30-generic)
  [ 1956.015617] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42422822  XER: 20000000
  [ 1956.015804] CFAR: c000000000009318 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0
  GPR00: c000000000587b8c c000003fcac5baa0 c00000000162e840 0000000000000063
  GPR04: c000000002f45bd0 c000000002f564c8 0000000000015ad0 c000000001827480
  GPR08: c000000000dfe840 0000000000000000 0000000000000001 0000000000015ad0
  GPR12: 0000000042422822 c000000007e5ff00 000001002fe90648 000000001016e008
  GPR16: 000000001013ad70 000001002fe94648 000000001016fed0 000000001016e008
  GPR20: 00000000100c31e0 0000000000000000 0000000010171fc8 000000001016f840
  GPR24: 0000000000000004 0000000000000000 0000000000000001 c0000000014b7dc8
  GPR28: c000000001974c90 0000000000000063 c00000000148d9c0 c0000000014b8188
  [ 1956.016794] NIP [c000000000586c2c] .sysrq_handle_crash+0x2c/0x40
  [ 1956.016858] LR [c000000000587b8c] .__handle_sysrq+0xfc/0x260
  [ 1956.016920] Call Trace:
  [ 1956.016948] [c000003fcac5baa0] [0000000010172a34] 0x10172a34 (unreliable)
  [ 1956.017025] [c000003fcac5bb10] [c000000000587b8c] .__handle_sysrq+0xfc/0x260
  [ 1956.017101] [c000003fcac5bbd0] [c000000000588324] .write_sysrq_trigger+0x74/0x90
  [ 1956.017190] [c000003fcac5bc50] [c0000000002dff1c] .proc_reg_write+0xac/0x110
  [ 1956.017266] [c000003fcac5bcf0] [c000000000254c00] .vfs_write+0xe0/0x260
  [ 1956.017342] [c000003fcac5bd90] [c0000000002558f4] .SyS_write+0x64/0xe0
  [ 1956.017418] [c000003fcac5be30] [c00000000000a158] syscall_exit+0x0/0x98
  [ 1956.017492] Instruction dump:
  [ 1956.017530] 4bffffac 7c0802a6 f8010010 f821ff91 60000000 60000000 3d42001f 392a8ca8
  [ 1956.017658] 39400001 91490000 7c0004ac 39200000 <99490000> 38210070 e8010010 7c0803a6
  [ 1956.017894] ---[ end trace d163ff42366bde72 ]---
  [ 1956.017986]
  [ 1956.018042] Sending IPI to other CPUs
  [ 1956.019188] IPI complete
   -> smp_release_cpus()
  spinning_secondaries = 159
   <- smp_release_cpus()
   <- setup_system()
  The console stays remains at this message until I power cycle the cec. There is no /proc/vmcore on reboot.
  
  I recreated the hang on my victim node.
  Some CPUs are hitting the 4400's interrupt vector. I think this is due to the commit 429d2e834295 "powerpc: Fix kdump hang issue on p8 with relocation on exception enabled." from Mahesh but I need to double check that since it may not be only patch missing.
  
  Definitively, the patch I mentioned is fixing the hang.
  Here are the commit details :
  
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867
  
  powerpc: Fix kdump hang issue on p8 with relocation on exception
  enabled.
  
  On p8 systems, with relocation on exception feature enabled we are seeing
  kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this
  feature enabled, exception are raised with MMU (IR=DR=1) ON with the
  default offset of 0xc*4000. Since exception is raised in virtual mode it
  requires the vector region to be executable without which it fails to
  fetch and execute instruction at 0xc*4xxx. For default kernel since kernel
  is loaded at real 0, the htab mappings sets the entire kernel text region
  executable. But for relocatable kernel (e.g. kdump case) we only copy
  interrupt vectors down to real 0 and never marked that region as
  executable because in p7 and below we always get exception in real mode.
  
  This patch fixes this issue by marking htab mapping range as executable
  that overlaps with the interrupt vector region for relocatable kernel.
  
  Thanks to Ben who helped me to debug this issue and find the root cause.
  
  Signed-off-by: Mahesh Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx>
  Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
  
  I think this bug should be mirrored to Ubuntu so they can include this
  patch in the 14.04 kernel, and may be also in the 14.10 kernel too.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1352056

Title:
  linux: kdump on Ubuntu 14.04 is not generating a dump.

Status in “linux” package in Ubuntu:
  Fix Released
Status in “linux” source package in Trusty:
  In Progress
Status in “linux” source package in Utopic:
  Fix Released

Bug description:
  SRU Justification:

  [Impact]
  Users of ppc64el hardware need the ability to use crashdumps to do kernel debugging.

  [Fix]
  Commit upstream and already in utopic:
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867

  [Test Case]
  Taken from:
  https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
  https://help.ubuntu.com/14.04/serverguide/kernel-crash-dump.html

  1) apt-get install linux-crashdump
  2) increase crashdump size:
  sudo vim /etc/default/grub.d/kexec-tools.cfg
  set crashkernel=1024M
  sudo update-grub
  3) reboot the machine
  4) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools
  5) kdump-config show # should return no errors
  6) echo 'c' | sudo tee /proc/sysrq-trigger
  7) This should crash the machine and we should kexec into another kernel to dump the core, then on the next reboot we should see a crash in /var/crash/*

  --

  ---Problem Description---
  kdump is not producing a dump on powerKVM LE P8 Ubuntu 14.04

  ---uname output---
  3.13.0-30-generic

  ---Additional Hardware Info---
  Power8 LE configuration.

  ---Patches Installed---
  1324544 - kdump-config load fails with vmlinux kernel (vs. vmlinuz)

  Machine Type = 8247-22L

  ---Steps to Reproduce---
  Installed kdump-tools 1.5.5-2ubuntu1 and crash 7.0.3-3ubuntu3.
  Updated /etc/default/kdump-tools, first I updated just USE_KDUMP=1. Rebooted the node and see:
  root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash crashkernel=384M-:128M
  root@c656f2n02:~# cat /proc/sys/kernel/sysrq
  1
  root@c656f2n02:~# cat /proc/sys/kernel/sysrq
  1
  root@c656f2n02:~# ^Cnd /proc | grep sysrq
  root@c656f2n02:~# kdump-config status
  current state   : ready to kdump
  root@c656f2n02:~# kdump-config show
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr:
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --args-linux --command-line="root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash  irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.13.0-30-generic /boot/vmlinux-3.13.0-30-generic

  root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_crash_loaded
  1
  root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_loaded
  0

  echo c > /proc/sysrq-trigger

  root@c656f2n02:/var/log# echo c > /proc/sysrq-trigger
  [ 1956.014243] SysRq : Trigger a crash
  [ 1956.014328] Unable to handle kernel paging request for data at address 0x00000000
  [ 1956.014404] Faulting instruction address: 0xc000000000586c2c
  [ 1956.014468] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 1956.014518] SMP NR_CPUS=2048 NUMA PowerNV
  [ 1956.014570] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables autofs4 rdma_ucm(OF) ib_ucm(OF) rdma_cm(OF) iw_cm(OF) ib_ipoib(OF) ib_cm(OF) ib_uverbs(OF) ib_umad(OF) mlx5_ib(OF) mlx5_core(OF) mlx4_ib(OF) ib_sa(OF) ib_mad(OF) ib_core(OF) ib_addr(OF) mlx4_en(OF) mlx4_core(OF) compat(OF) nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache rtc_generic powernv_rng ses enclosure ipr
  [ 1956.015306] CPU: 146 PID: 2522 Comm: bash Tainted: GF          O 3.13.0-30-generic #54-Ubuntu
  [ 1956.015394] task: c000003fcabda120 ti: c000003fcac58000 task.ti: c000003fcac58000
  [ 1956.015469] NIP: c000000000586c2c LR: c000000000587b8c CTR: c000000000586c00
  [ 1956.015543] REGS: c000003fcac5b820 TRAP: 0300   Tainted: GF          O  (3.13.0-30-generic)
  [ 1956.015617] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42422822  XER: 20000000
  [ 1956.015804] CFAR: c000000000009318 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0
  GPR00: c000000000587b8c c000003fcac5baa0 c00000000162e840 0000000000000063
  GPR04: c000000002f45bd0 c000000002f564c8 0000000000015ad0 c000000001827480
  GPR08: c000000000dfe840 0000000000000000 0000000000000001 0000000000015ad0
  GPR12: 0000000042422822 c000000007e5ff00 000001002fe90648 000000001016e008
  GPR16: 000000001013ad70 000001002fe94648 000000001016fed0 000000001016e008
  GPR20: 00000000100c31e0 0000000000000000 0000000010171fc8 000000001016f840
  GPR24: 0000000000000004 0000000000000000 0000000000000001 c0000000014b7dc8
  GPR28: c000000001974c90 0000000000000063 c00000000148d9c0 c0000000014b8188
  [ 1956.016794] NIP [c000000000586c2c] .sysrq_handle_crash+0x2c/0x40
  [ 1956.016858] LR [c000000000587b8c] .__handle_sysrq+0xfc/0x260
  [ 1956.016920] Call Trace:
  [ 1956.016948] [c000003fcac5baa0] [0000000010172a34] 0x10172a34 (unreliable)
  [ 1956.017025] [c000003fcac5bb10] [c000000000587b8c] .__handle_sysrq+0xfc/0x260
  [ 1956.017101] [c000003fcac5bbd0] [c000000000588324] .write_sysrq_trigger+0x74/0x90
  [ 1956.017190] [c000003fcac5bc50] [c0000000002dff1c] .proc_reg_write+0xac/0x110
  [ 1956.017266] [c000003fcac5bcf0] [c000000000254c00] .vfs_write+0xe0/0x260
  [ 1956.017342] [c000003fcac5bd90] [c0000000002558f4] .SyS_write+0x64/0xe0
  [ 1956.017418] [c000003fcac5be30] [c00000000000a158] syscall_exit+0x0/0x98
  [ 1956.017492] Instruction dump:
  [ 1956.017530] 4bffffac 7c0802a6 f8010010 f821ff91 60000000 60000000 3d42001f 392a8ca8
  [ 1956.017658] 39400001 91490000 7c0004ac 39200000 <99490000> 38210070 e8010010 7c0803a6
  [ 1956.017894] ---[ end trace d163ff42366bde72 ]---
  [ 1956.017986]
  [ 1956.018042] Sending IPI to other CPUs
  [ 1956.019188] IPI complete
   -> smp_release_cpus()
  spinning_secondaries = 159
   <- smp_release_cpus()
   <- setup_system()
  The console stays remains at this message until I power cycle the cec. There is no /proc/vmcore on reboot.

  I recreated the hang on my victim node.
  Some CPUs are hitting the 4400's interrupt vector. I think this is due to the commit 429d2e834295 "powerpc: Fix kdump hang issue on p8 with relocation on exception enabled." from Mahesh but I need to double check that since it may not be only patch missing.

  Definitively, the patch I mentioned is fixing the hang.
  Here are the commit details :

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867

  powerpc: Fix kdump hang issue on p8 with relocation on exception
  enabled.

  On p8 systems, with relocation on exception feature enabled we are seeing
  kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this
  feature enabled, exception are raised with MMU (IR=DR=1) ON with the
  default offset of 0xc*4000. Since exception is raised in virtual mode it
  requires the vector region to be executable without which it fails to
  fetch and execute instruction at 0xc*4xxx. For default kernel since kernel
  is loaded at real 0, the htab mappings sets the entire kernel text region
  executable. But for relocatable kernel (e.g. kdump case) we only copy
  interrupt vectors down to real 0 and never marked that region as
  executable because in p7 and below we always get exception in real mode.

  This patch fixes this issue by marking htab mapping range as executable
  that overlaps with the interrupt vector region for relocatable kernel.

  Thanks to Ben who helped me to debug this issue and find the root
  cause.

  Signed-off-by: Mahesh Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx>
  Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>

  I think this bug should be mirrored to Ubuntu so they can include this
  patch in the 14.04 kernel, and may be also in the 14.10 kernel too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1352056/+subscriptions