← Back to team overview

kernel-packages team mailing list archive

[Bug 1536904] Re: Kdump fails on Ubuntu 16.04 (PowerVM/PowerKVM/BareMetal)

 

Hello,

Sorry for the delay in replying.

This is definitively a regression following the implementation of
smaller initrd. I am currently working at fixing this. Your second
problem might be caused by not using smaller initrd so I would suggest
to wait to test the fix for this.

I can have a test package available quickly if you have the possibility
of testing from a PPA (according to a previous bug I think you do) so
let me know & I'll tell you where to find the PPA.

** Changed in: makedumpfile (Ubuntu)
       Status: Invalid => In Progress

** Changed in: makedumpfile (Ubuntu)
   Importance: Undecided => High

** Changed in: makedumpfile (Ubuntu)
     Assignee: Taco Screen team (taco-screen-team) => Louis Bouchard (louis-bouchard)

** Also affects: makedumpfile (Ubuntu Wily)
   Importance: Undecided
       Status: New

** Changed in: makedumpfile (Ubuntu Wily)
       Status: New => Confirmed

** Changed in: makedumpfile (Ubuntu Wily)
   Importance: Undecided => High

** Changed in: makedumpfile (Ubuntu Wily)
     Assignee: (unassigned) => Louis Bouchard (louis-bouchard)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1536904

Title:
  Kdump fails on Ubuntu 16.04 (PowerVM/PowerKVM/BareMetal)

Status in makedumpfile package in Ubuntu:
  In Progress
Status in makedumpfile source package in Wily:
  Confirmed

Bug description:
  == Comment: #0 - ==
  ---Problem Description---
  Kdump fails on Ubuntu 16.04 with Austin adapter(tg3)
   
  Contact Information = hathyaga@xxxxxxxxxx, iranna.ankad@xxxxxxxxxx,mputtash@xxxxxxxxxx 
   
  ---uname output---
  linux ltciofvtr-s822l1 4.3.0-5-generic #16-Ubuntu SMP Wed Dec 16 23:32:23 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  Machine details:
  9.47.67.156 (root/ltcnetdd) 

   
  Machine Type = 8247-22L 
   
  ---System Hang---
   The system hangs after triggering a crash. Need to reboot to bring it up and functional. 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Steps to follow:
  1. apt-get install linux-crashdump
  2. apt-get install kdump-tools
  3. Edit /etc/default/kdump-tools and change the following:
  USE_KDUMP=0 to 1
  4. Change the size of the crash kernel in /boot/grub/grub.cfg to crashkernel=4096M-:4096M
  5. Load the kdump config file: kdump-config load
  6. echo 1 > /proc/sys/kernel/sysrq
  7. echo c > /proc/sysrq-trigger

  
  Things to look at to cross-check are:

  After loading the kdump-config file, check for it's status
  root@ltciofvtr-s822l1:~# kdump-config show
  DUMP_MODE:        kdump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr: 
  SSH:              root@35.35.35.36
  SSH_KEY:          /root/.ssh/id_rsa
  HOSTTAG:          ip
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --args-linux --command-line="root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/boot/initrd.img-4.3.0-5-generic /boot/vmlinux-4.3.0-5-generic


  root@ltciofvtr-s822l1:~# kdump-config status
   * Broken symlink : /var/lib/kdump/vmlinuz: broken symbolic link to /boot/vmlinuz-4.3.0-5-generic
  current state   : ready to kdump


  root@ltciofvtr-s822l1:~# cat /proc/cmdline 
  root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash crashkernel=4096M-:4096M


  root@ltciofvtr-s822l1:~# dmesg| grep -i crash
  [    0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 131072MB)
  [    0.000000] Kernel command line: root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash crashkernel=4096M-:4096M

  
  Observations:
  1. Kdump-config status command reports that there is a broken symbloic link suggesting that kdump-config file is unable to handle the symbolic link. 

  2. Trace observed on console:
  root@ltciofvtr-s822l1:~# echo c | tee /proc/sysrq-trigger 
  c
  [  238.872102] sysrq: SysRq : Trigger a crash
  [  238.872179] Unable to handle kernel paging request for data at address 0x00000000
  [  238.872256] Faulting instruction address: 0xc000000000646534
  [  238.872322] Oops: Kernel access of bad area, sig: 11 [#1]
  [  238.872373] SMP NR_CPUS=2048 NUMA PowerNV
  [  238.872427] Modules linked in: dm_round_robin dm_service_time ipmi_powernv ipmi_msghandler leds_powernv uio_pdrv_genirq powernv_rng uio dm_multipath sunrpc bonding autofs4 btrfs xor raid6_pq mlx4_en ses enclosure bnx2x mlx4_core lpfc qla2xxx mdio libcrc32c be2net e1000e vxlan ipr ip6_udp_tunnel udp_tunnel scsi_transport_fc
  [  238.872895] CPU: 121 PID: 3861 Comm: tee Not tainted 4.3.0-5-generic #16-Ubuntu
  [  238.872973] task: c000000fe01ce860 ti: c000000fe022c000 task.ti: c000000fe022c000
  [  238.873049] NIP: c000000000646534 LR: c0000000006475f8 CTR: c000000000646500
  [  238.873125] REGS: c000000fe022f990 TRAP: 0300   Not tainted  (4.3.0-5-generic)
  [  238.873200] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004222  XER: 20000000
  [  238.873392] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 
  GPR00: c0000000006475f8 c000000fe022fc10 c00000000155e400 0000000000000063 
  GPR04: c0000007fc648450 c0000007fc659cf0 c000001fff830000 0000000000000792 
  GPR08: 0000000000000007 0000000000000001 0000000000000000 c000001fff861780 
  GPR12: c000000000646500 c000000007b87d80 0000000000000000 0000000000000000 
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
  GPR20: 0000000000000000 0000000000000000 0000000010009d88 0000000000000001 
  GPR24: 0000000010009d88 00003fffe7b210b0 c0000000014a5cb0 0000000000000004 
  GPR28: c0000000014a6070 0000000000000063 c000000001460de4 0000000000000000 
  [  238.875062] NIP [c000000000646534] sysrq_handle_crash+0x34/0x50
  [  238.875178] LR [c0000000006475f8] __handle_sysrq+0xe8/0x280
  [  238.875270] Call Trace:
  [  238.875322] [c000000fe022fc10] [c000000000dc92a0] _fw_tigon_tg3_bin_name+0x2c5d0/0x33708 (unreliable)
  [  238.875516] [c000000fe022fc30] [c0000000006475f8] __handle_sysrq+0xe8/0x280
  [  238.875658] [c000000fe022fcd0] [c000000000647da8] write_sysrq_trigger+0x78/0xa0
  [  238.875820] [c000000fe022fd00] [c00000000036bf50] proc_reg_write+0xb0/0x110
  [  238.875963] [c000000fe022fd50] [c0000000002d45bc] __vfs_write+0x6c/0xe0
  [  238.876104] [c000000fe022fd90] [c0000000002d52f0] vfs_write+0xc0/0x230
  [  238.876246] [c000000fe022fde0] [c0000000002d632c] SyS_write+0x6c/0x110
  [  238.876389] [c000000fe022fe30] [c000000000009204] system_call+0x38/0xb4
  [  238.876525] Instruction dump:
  [  238.876601] 38427f00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949aae4 
  [  238.876843] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 
  [  238.877091] ---[ end trace 2028716a4fb3f0e5 ]---
  [  238.880521] 
  [  238.880590] Sending IPI to other CPUs
  [  238.881716] IPI complete

  The system hang is observed here.

  3. No crash dump generated after a reboot.

  4. Kdump hang also observed on kvm ,PowerVM as well open power
   
  Stack trace output:
   [  238.875270] Call Trace:
  [  238.875322] [c000000fe022fc10] [c000000000dc92a0] _fw_tigon_tg3_bin_name+0x2c5d0/0x33708 (unreliable)
  [  238.875516] [c000000fe022fc30] [c0000000006475f8] __handle_sysrq+0xe8/0x280
  [  238.875658] [c000000fe022fcd0] [c000000000647da8] write_sysrq_trigger+0x78/0xa0
  [  238.875820] [c000000fe022fd00] [c00000000036bf50] proc_reg_write+0xb0/0x110
  [  238.875963] [c000000fe022fd50] [c0000000002d45bc] __vfs_write+0x6c/0xe0
  [  238.876104] [c000000fe022fd90] [c0000000002d52f0] vfs_write+0xc0/0x230
  [  238.876246] [c000000fe022fde0] [c0000000002d632c] SyS_write+0x6c/0x110
  [  238.876389] [c000000fe022fe30] [c000000000009204] system_call+0x38/0xb4
  [  238.876525] Instruction dump:
  [  238.876601] 38427f00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949aae4 
  [  238.876843] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 
  [  238.877091] ---[ end trace 2028716a4fb3f0e5 ]---
   
  Oops output:
   no
   
  System Dump Location:
   No dump generated
   
  *Additional Instructions for hathyaga@xxxxxxxxxx, iranna.ankad@xxxxxxxxxx,mputtash@xxxxxxxxxx: 
  -Post a private note with access information to the machine that the bug is occuring on. 
  -Attach sysctl -a output output to the bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1536904/+subscriptions