← Back to team overview

kernel-packages team mailing list archive

[Bug 1410817] Re: Kdump triggered manually after cpu offline operation fails to collect dump

 

Chris,

Is it possible to give us one or more days to test this fix? I expect to
have it validated by Monday.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1410817

Title:
  Kdump triggered manually after cpu offline operation fails to collect
  dump

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Utopic:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]
  Kdump triggered manually after cpu offline operation fails to collect dump

  [Test Case]
  See Steps to Reproduce below.

  [Fix]
  $ git describe --contains c1caae3de46a072d0855729aed6e793e536a4a55
  v3.19-rc3~1^2~1

  --

  
  ---Problem Description---
  Kdump triggered manually after cpu offline operation fails to collect dump

  ---uname output---
  Linux ubuntu 3.18.0-9-generic #10-Ubuntu SMP Mon Jan 12 21:35:28 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = P8

  ---System Hang---
  We have to reboot the LPAR and gain access to the machine again.

  ---Steps to Reproduce---
  Install a Power VM LPAR with Ubuntu 15.04 ISO using Virtual DVD.
  Then offline one of the cpu's of the machine.

  root@ubuntu:~# lscpu
  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                16
  On-line CPU(s) list:   0-15
  Thread(s) per core:    8
  Core(s) per socket:    1
  Socket(s):             2
  NUMA node(s):          2
  Model:                 IBM,8284-22A
  Hypervisor vendor:     pHyp
  Virtualization type:   para
  L1d cache:             64K
  L1i cache:             32K
  NUMA node0 CPU(s):     0-15
  NUMA node2 CPU(s):

  root@ubuntu:~# chcpu -d 15
  CPU 15 disabled

  root@ubuntu:~# lscpu
  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                16
  On-line CPU(s) list:   0-14
  Off-line CPU(s) list:  15
  Thread(s) per core:    7
  Core(s) per socket:    1
  Socket(s):             2
  NUMA node(s):          2
  Model:                 IBM,8284-22A
  Hypervisor vendor:     pHyp
  Virtualization type:   para
  L1d cache:             64K
  L1i cache:             32K
  NUMA node0 CPU(s):     0-14
  NUMA node2 CPU(s):

  Configure and enable kdump on the LPAR.

  root@ubuntu:~# /etc/init.d/kdump-tools status
  current state   : ready to kdump
  root@ubuntu:~# kdump-config load
  Modified cmdline:BOOT_IMAGE=/boot/vmlinux-3.18.0-9-generic root=UUID=70957e56-8669-466f-b0e7-140f2ec39a04 ro splash quiet irqpoll maxcpus=1 nousb elfcorehdr=155072K
  segment[0].mem:0x8000000 memsz:24510464
  segment[1].mem:0x9760000 memsz:65536
  segment[2].mem:0x9770000 memsz:65536
  segment[3].mem:0x9780000 memsz:65536
  segment[4].mem:0x9790000 memsz:22020096
  segment[5].mem:0xec70000 memsz:196608
   * loaded kdump kernel
  root@ubuntu:~#

  root@ubuntu:~# kdump-config show
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr:
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --args-linux --command-line="BOOT_IMAGE=/boot/vmlinux-3.18.0-9-generic root=UUID=70957e56-8669-466f-b0e7-140f2ec39a04 ro splash quiet irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.18.0-9-generic /boot/vmlinux-3.18.0-9-generic
  root@ubuntu:~# kdump-config status
  current state   : ready to kdump

  root@ubuntu:~# sysctl -w kernel.sysrq=1
  kernel.sysrq = 1
  root@ubuntu:~# cat /proc/sys/kernel/sysrq
  1

  Trigger the crash manually using sysrq-trigger.

  root@ubuntu:~# echo c > /proc/sysrq-trigger

  root@ubuntu:~# [  311.088315] SysRq : Trigger a crash
  [  311.088331] Unable to handle kernel paging request for data at address 0x00000000
  [  311.088336] Faulting instruction address: 0xc0000000005f9094
  [  311.088341] Oops: Kernel access of bad area, sig: 11 [#1]
  [  311.088344] SMP NR_CPUS=2048 NUMA pSeries
  [  311.088349] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables pseries_rng rtc_generic binfmt_misc
  [  311.088372] CPU: 14 PID: 1705 Comm: bash Not tainted 3.18.0-9-generic #10-Ubuntu
  [  311.088377] task: c00000027773e470 ti: c0000002782d0000 task.ti: c0000002782d0000
  [  311.088381] NIP: c0000000005f9094 LR: c0000000005fa12c CTR: c0000000005f9060
  [  311.088385] REGS: c0000002782d39d0 TRAP: 0300   Not tainted  (3.18.0-9-generic)
  [  311.088389] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28242822  XER: 00000001
  [  311.088401] CFAR: c0000000000084d8 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
  GPR00: c0000000005fa12c c0000002782d3c50 c000000001426890 0000000000000063
  GPR04: c000000001b85c28 c000000001b965e0 00000000000000ff c0000000015e71f0
  GPR08: c000000000e76890 0000000000000001 0000000000000000 0000000000000001
  GPR12: c0000000005f9060 c000000007b37e00 0000000000000000 0000000022000000
  GPR16: 000000001016d6e8 0000010000088208 0000000010143eb8 00000000100c9390
  GPR20: 0000000000000000 000000001017b008 0000000010143d18 0000000000000000
  GPR24: 0000000010156c00 0000000010178868 c0000000013756a8 0000000000000004
  GPR28: 0000000000000063 c00000000133f598 c000000001375a68 0000000000000000
  [  311.088459] NIP [c0000000005f9094] sysrq_handle_crash+0x34/0x50
  [  311.088463] LR [c0000000005fa12c] __handle_sysrq+0xec/0x280
  [  311.088467] Call Trace:
  [  311.088470] [c0000002782d3c50] [c000000000056604] ht64_call_hpte_insert1+0x4/0x3c (unreliable)
  [  311.088476] [c0000002782d3c70] [c0000000005fa12c] __handle_sysrq+0xec/0x280
  [  311.088481] [c0000002782d3d10] [c0000000005fa928] write_sysrq_trigger+0x78/0xa0
  [  311.088488] [c0000002782d3d40] [c000000000345a10] proc_reg_write+0xb0/0x110
  [  311.088494] [c0000002782d3d90] [c0000000002b954c] vfs_write+0xdc/0x260
  [  311.088499] [c0000002782d3de0] [c0000000002ba0ec] SyS_write+0x6c/0x110
  [  311.088504] [c0000002782d3e30] [c00000000000927c] syscall_exit+0x0/0x7c
  [  311.088508] Instruction dump:
  [  311.088511] 3842d830 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001b 39491cdc
  [  311.088519] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
  [  311.088528] ---[ end trace 8543f2d87847eab7 ]---
  [  311.090822]
  [  311.090851] Sending IPI to other CPUs
  [  311.091870] IPI complete
  [  312.466826] Kernel panic - not syncing: Could not enable big endian exceptions

  root@ubuntu:~# which kdump
  /sbin/kdump
  root@ubuntu:~# dpkg -S /sbin/kdump
  kexec-tools: /sbin/kdump
  root@ubuntu:~# dpkg --list | grep kexec
  ii  kexec-tools                          1:2.0.7-5ubuntu1                           ppc64el      tools to support fast kexec reboots
  ii  pxe-kexec                            0.2.4-3                                    ppc64el      Fetch PXE configuration file and netboot using kexec

  The fix patch is available upstream
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55

  Thanks
  Hari

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1410817/+subscriptions