← Back to team overview

kernel-packages team mailing list archive

[Bug 1536904] Comment bridged from LTC Bugzilla

 

------- Comment From hbathini@xxxxxxxxxx 2016-01-25 06:32 EDT-------
(In reply to comment #12)
> Hello,
>
> I must admit that I am a bit puzzled by some of your statements :
>
> ---Steps to Reproduce---
> Steps to follow:
> 1. apt-get install linux-crashdump
> 2. apt-get install kdump-tools
>
> Step 2 is not required : kdump-tools is a dependency of linux-crashdump so
> it is installed automatically :
>
> 3. Edit /etc/default/kdump-tools and change the following:
> USE_KDUMP=0 to 1
> This is not required. kdump-tools 1:1.5.9-3 does that automatically during
> installation and you should be prompted to accept it :
>
> ???????????????????????????????????????????????? Configuring kdump-tools
> ????????????????????????????????????????????????
> ?
> ?
> ? If you choose this option, the kdump-tools mechanism will be enabled. A
> reboot is still required in order to enable   ?
> ? the crashkernel kernel parameter.
> ?
> ?
> ?
> ? Should kdump-tools be enabled by default?
> ?
> ?
> ?
> ?                                   <Yes>
> <No>                                     ?
> ?
> ?
> ?????????????????????????????????????????????????????????????????????????????
> ????????????????????????????????????????????
>
> 4. Change the size of the crash kernel in /boot/grub/grub.cfg to
> crashkernel=4096M-:4096M
> 5. Load the kdump config file: kdump-config load
>
> This will fail with the following :
>
> # kdump-config load
> * no crashkernel= parameter in the kernel cmdline
>

Hi Louis,

The description should have another step between 4 & 5.
4a. Reboot.

While this step was missing in description, reboot was done as you can see
from the output of "kdump-config show" command.

> Which is normal as a reboot is required in order to have the crashkernel
> parameter taken into account after the reboot.
>
> 6. echo 1 > /proc/sys/kernel/sysrq
> 7. echo c > /proc/sysrq-trigger
>
> The hang following this command is normal : as previously stated, a reboot
> is required otherwise kdump-tools is not loaded.
>
> After the reboot, you should see the following in /var/log/syslog :
>
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: Starting kdump-tools:  *
> Missing symlink : /var/lib/kdump/initrd.img
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Creating symlink
> /var/lib/kdump/initrd.img
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Missing symlink :
> /var/lib/kdump/vmlinuz
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Creating symlink
> /var/lib/kdump/vmlinuz
> Jan 25 11:12:26 XenialS-crashdump kdump-tools[523]: * loaded kdump kernel
>
> To verify the status of kdump you can do :
>
> # kdump-config show
> DUMP_MODE:        kdump
> USE_KDUMP:        1
> KDUMP_SYSCTL:     kernel.panic_on_oops=1
> KDUMP_COREDIR:    /var/crash
> crashkernel addr: 0x2c000000
> current state:    ready to kdump
>
> kexec command:
> /sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-4.3.0-7-generic
> root=/dev/mapper/VividS--vg-root ro console=ttyS0,115200 irqpoll maxcpus=1
> nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img
> /var/lib/kdump/vmlinuz
>
> Your bug statement shows the following :
>
> /sbin/kexec -p --args-linux
> --command-line="root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet
> splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service"
> --initrd=/boot/initrd.img-4.3.0-5-generic /boot/vmlinux-4.3.0-5-generic
>
> This : --initrd=/boot/initrd.img-4.3.0-5-generic
> /boot/vmlinux-4.3.0-5-generic indicates that kdump-tools is not properly
> configured and that your /etc/default/kdump-tools file is the one from a
> previous version.  I suspect that kdump-tools was improperly configured or
> that installation of the maintainer's version of the file was refused.
>
> Since 15.10 (Wily), /etc/default/kdump-tools has the following :
>
> KDUMP_KERNEL=/var/lib/kdump/vmlinuz
> KDUMP_INITRD=/var/lib/kdump/initrd.img
>
> This was implemented to fix LP: #1496317 which might be the bug that you are
> encountering.
>

We couldn't use that setting on ppc64le as it has vmlinux.
This bug is similar to vmlinuz vs vmlinux problem we had earlier with trusty
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1324544

Actually, there are two problems here.
1. No vmlinuz on ppc64le which makes the kdump-tools script act weird.

- We tried to workaround this problem by commenting out KDUMP_KERNEL
& KDUMP_INITRD so that the script picks /boot/vmlinux-4.3.0-5-generic and
/boot/initrd.img-4.3.0-5-generic respectively.

2. After triggering crash, we are hanging right after "IPI complete". This could well be
related to kernel or kexec-tools. I am debugging the issue..

We can use this bug to track one of the problems, say vmlinux vs vmlinuz problem (first problem).
Let me raise another bug to track the kernel/kexec-tools problem..

Thanks
Hari

> I suggest that you verify your configuration and run the test again as your
> description does not describe an actual bug but misconfiguration of
> kdump-tools

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1536904

Title:
  Kdump fails on Ubuntu 16.04 (PowerVM/PowerKVM/BareMetal)

Status in makedumpfile package in Ubuntu:
  Invalid

Bug description:
  == Comment: #0 - ==
  ---Problem Description---
  Kdump fails on Ubuntu 16.04 with Austin adapter(tg3)
   
  Contact Information = hathyaga@xxxxxxxxxx, iranna.ankad@xxxxxxxxxx,mputtash@xxxxxxxxxx 
   
  ---uname output---
  linux ltciofvtr-s822l1 4.3.0-5-generic #16-Ubuntu SMP Wed Dec 16 23:32:23 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  Machine details:
  9.47.67.156 (root/ltcnetdd) 

   
  Machine Type = 8247-22L 
   
  ---System Hang---
   The system hangs after triggering a crash. Need to reboot to bring it up and functional. 
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Steps to follow:
  1. apt-get install linux-crashdump
  2. apt-get install kdump-tools
  3. Edit /etc/default/kdump-tools and change the following:
  USE_KDUMP=0 to 1
  4. Change the size of the crash kernel in /boot/grub/grub.cfg to crashkernel=4096M-:4096M
  5. Load the kdump config file: kdump-config load
  6. echo 1 > /proc/sys/kernel/sysrq
  7. echo c > /proc/sysrq-trigger

  
  Things to look at to cross-check are:

  After loading the kdump-config file, check for it's status
  root@ltciofvtr-s822l1:~# kdump-config show
  DUMP_MODE:        kdump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
  crashkernel addr: 
  SSH:              root@35.35.35.36
  SSH_KEY:          /root/.ssh/id_rsa
  HOSTTAG:          ip
  current state:    ready to kdump

  kexec command:
    /sbin/kexec -p --args-linux --command-line="root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/boot/initrd.img-4.3.0-5-generic /boot/vmlinux-4.3.0-5-generic


  root@ltciofvtr-s822l1:~# kdump-config status
   * Broken symlink : /var/lib/kdump/vmlinuz: broken symbolic link to /boot/vmlinuz-4.3.0-5-generic
  current state   : ready to kdump


  root@ltciofvtr-s822l1:~# cat /proc/cmdline 
  root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash crashkernel=4096M-:4096M


  root@ltciofvtr-s822l1:~# dmesg| grep -i crash
  [    0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 131072MB)
  [    0.000000] Kernel command line: root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash crashkernel=4096M-:4096M

  
  Observations:
  1. Kdump-config status command reports that there is a broken symbloic link suggesting that kdump-config file is unable to handle the symbolic link. 

  2. Trace observed on console:
  root@ltciofvtr-s822l1:~# echo c | tee /proc/sysrq-trigger 
  c
  [  238.872102] sysrq: SysRq : Trigger a crash
  [  238.872179] Unable to handle kernel paging request for data at address 0x00000000
  [  238.872256] Faulting instruction address: 0xc000000000646534
  [  238.872322] Oops: Kernel access of bad area, sig: 11 [#1]
  [  238.872373] SMP NR_CPUS=2048 NUMA PowerNV
  [  238.872427] Modules linked in: dm_round_robin dm_service_time ipmi_powernv ipmi_msghandler leds_powernv uio_pdrv_genirq powernv_rng uio dm_multipath sunrpc bonding autofs4 btrfs xor raid6_pq mlx4_en ses enclosure bnx2x mlx4_core lpfc qla2xxx mdio libcrc32c be2net e1000e vxlan ipr ip6_udp_tunnel udp_tunnel scsi_transport_fc
  [  238.872895] CPU: 121 PID: 3861 Comm: tee Not tainted 4.3.0-5-generic #16-Ubuntu
  [  238.872973] task: c000000fe01ce860 ti: c000000fe022c000 task.ti: c000000fe022c000
  [  238.873049] NIP: c000000000646534 LR: c0000000006475f8 CTR: c000000000646500
  [  238.873125] REGS: c000000fe022f990 TRAP: 0300   Not tainted  (4.3.0-5-generic)
  [  238.873200] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004222  XER: 20000000
  [  238.873392] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 
  GPR00: c0000000006475f8 c000000fe022fc10 c00000000155e400 0000000000000063 
  GPR04: c0000007fc648450 c0000007fc659cf0 c000001fff830000 0000000000000792 
  GPR08: 0000000000000007 0000000000000001 0000000000000000 c000001fff861780 
  GPR12: c000000000646500 c000000007b87d80 0000000000000000 0000000000000000 
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
  GPR20: 0000000000000000 0000000000000000 0000000010009d88 0000000000000001 
  GPR24: 0000000010009d88 00003fffe7b210b0 c0000000014a5cb0 0000000000000004 
  GPR28: c0000000014a6070 0000000000000063 c000000001460de4 0000000000000000 
  [  238.875062] NIP [c000000000646534] sysrq_handle_crash+0x34/0x50
  [  238.875178] LR [c0000000006475f8] __handle_sysrq+0xe8/0x280
  [  238.875270] Call Trace:
  [  238.875322] [c000000fe022fc10] [c000000000dc92a0] _fw_tigon_tg3_bin_name+0x2c5d0/0x33708 (unreliable)
  [  238.875516] [c000000fe022fc30] [c0000000006475f8] __handle_sysrq+0xe8/0x280
  [  238.875658] [c000000fe022fcd0] [c000000000647da8] write_sysrq_trigger+0x78/0xa0
  [  238.875820] [c000000fe022fd00] [c00000000036bf50] proc_reg_write+0xb0/0x110
  [  238.875963] [c000000fe022fd50] [c0000000002d45bc] __vfs_write+0x6c/0xe0
  [  238.876104] [c000000fe022fd90] [c0000000002d52f0] vfs_write+0xc0/0x230
  [  238.876246] [c000000fe022fde0] [c0000000002d632c] SyS_write+0x6c/0x110
  [  238.876389] [c000000fe022fe30] [c000000000009204] system_call+0x38/0xb4
  [  238.876525] Instruction dump:
  [  238.876601] 38427f00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949aae4 
  [  238.876843] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 
  [  238.877091] ---[ end trace 2028716a4fb3f0e5 ]---
  [  238.880521] 
  [  238.880590] Sending IPI to other CPUs
  [  238.881716] IPI complete

  The system hang is observed here.

  3. No crash dump generated after a reboot.

  4. Kdump hang also observed on kvm ,PowerVM as well open power
   
  Stack trace output:
   [  238.875270] Call Trace:
  [  238.875322] [c000000fe022fc10] [c000000000dc92a0] _fw_tigon_tg3_bin_name+0x2c5d0/0x33708 (unreliable)
  [  238.875516] [c000000fe022fc30] [c0000000006475f8] __handle_sysrq+0xe8/0x280
  [  238.875658] [c000000fe022fcd0] [c000000000647da8] write_sysrq_trigger+0x78/0xa0
  [  238.875820] [c000000fe022fd00] [c00000000036bf50] proc_reg_write+0xb0/0x110
  [  238.875963] [c000000fe022fd50] [c0000000002d45bc] __vfs_write+0x6c/0xe0
  [  238.876104] [c000000fe022fd90] [c0000000002d52f0] vfs_write+0xc0/0x230
  [  238.876246] [c000000fe022fde0] [c0000000002d632c] SyS_write+0x6c/0x110
  [  238.876389] [c000000fe022fe30] [c000000000009204] system_call+0x38/0xb4
  [  238.876525] Instruction dump:
  [  238.876601] 38427f00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949aae4 
  [  238.876843] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 
  [  238.877091] ---[ end trace 2028716a4fb3f0e5 ]---
   
  Oops output:
   no
   
  System Dump Location:
   No dump generated
   
  *Additional Instructions for hathyaga@xxxxxxxxxx, iranna.ankad@xxxxxxxxxx,mputtash@xxxxxxxxxx: 
  -Post a private note with access information to the machine that the bug is occuring on. 
  -Attach sysctl -a output output to the bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1536904/+subscriptions