← Back to team overview

kernel-packages team mailing list archive

[Bug 1567539] Re: Failure to dump with trusty+3.16 on ppc64el

 

Hello,

Regarding nr_cpus=1, the equivalent maxcpus=1 is set in the kexec
command (at least on default installs) :

$ kdump-config show
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2b000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.4.0-17-generic
kdump initrd: 
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-17-generic
current state:    ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-4.4.0-17-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7 irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
                                  ^^^^^^^^^^^

Maybe disabling SMP alltogether by setting maxcpus=0 could be considered
but that shouldn't change much aside from not reserving any SMP data
structure. To Be Tested.

Regarding kvm_cma_resv_ratio=0, this will  avoid the error message but
it has on bearing on the current situation. It failed to allocated it so
the memory was not in use.

256Gb of RAM doesn't mean that 128Mb needs to be increased. Here is the
output of free on a 128Gb system right after a kernel panic :

(initramfs) chroot /root free -h
              total        used        free      shared  buff/cache   available
Mem:            99M         20M         21M         48K         57M         62M
Swap:            0B          0B          0B

And here is the memory allocation in the same context :
(initramfs) chroot /root cat /proc/meminfo               
MemTotal:         102000 kB                                                                                                   
MemFree:           22260 kB                                                                                                   
MemAvailable:      63700 kB                                                                                                   
Buffers:            1640 kB                                                                                                   
Cached:            33944 kB                                                                                                   
SwapCached:            0 kB                                                                                                   
Active:            12400 kB                                                                                                   
Inactive:          23584 kB                                                                                                   
Active(anon):        416 kB                                                                                                   
Inactive(anon):       28 kB                                                                                                   
Active(file):      11984 kB                                                                                                   
Inactive(file):    23556 kB                                                                                                   
Unevictable:           0 kB                                                                                                   
Mlocked:               0 kB                                                                                                   
SwapTotal:             0 kB                                                                                                   
SwapFree:              0 kB                                                                                                   
Dirty:                 0 kB                                                                                                   
Writeback:             0 kB                                                                                                   
AnonPages:           408 kB                                                                                                   
Mapped:             3128 kB                                                                                                   
Shmem:                48 kB                                                                                                   
Slab:              23228 kB                                                                                                   
SReclaimable:      10568 kB                                                                                                   
SUnreclaim:        12660 kB                                                                                                   
KernelStack:        1952 kB                                                                                                   
PageTables:           96 kB                                                                                                   
NFS_Unstable:          0 kB                                                                                                   
Bounce:                0 kB                                                                                                   
WritebackTmp:          0 kB                                                                                                   
CommitLimit:       51000 kB                                                                                                   
Committed_AS:       1212 kB                                                                                                   
VmallocTotal:   34359738367 kB                                                                                                
VmallocUsed:      281500 kB                                                                                                   
VmallocChunk:   34358935548 kB                                                                                                
HardwareCorrupted:     0 kB                                                                                                   
AnonHugePages:         0 kB                                                                                                   
CmaTotal:              0 kB                                                                                                   
CmaFree:               0 kB                                                                                                   
HugePages_Total:       0                                                                                                      
HugePages_Free:        0                                                                                                      
HugePages_Rsvd:        0                                                                                                      
HugePages_Surp:        0                                                                                                      
Hugepagesize:       2048 kB 
DirectMap4k:       27048 kB                                                                                                   
DirectMap2M:      104448 kB                                                                                                   
DirectMap1G:           0 kB                                                                                                   

The 128Mb is used to allocate kernel data structure, load the initrd in
RAM. Then more memory is needed for makedumpfile to read, convert and
compress /proc/vmcore into a file.  Since makedumpfile 1.5.5 it uses a
memory footprint that is rather stable and minimally increase depending
on the size of the RAM.

In this case, makedumpfile has not even started to execute so it is also
safe to exclude that as the source of the problem.

This in return indicates a problem :

[ 0.000000] bootmem alloc of 41943040 bytes failed!
[ 0.000000] Kernel panic - not syncing: Out of memory

This is very early on boot and it fails to allocate 40Mb of memory for
bootmem. My suspicion is that it is unable to allocate that memory at
the start of the memory.

The only information I could gather by a quick google search is this :

https://www.novell.com/support/kb/doc.php?id=3374462

For SLES, they suggest to allocate the memory starting at 32M so you
might want to replace the crashkernel value by :

crashkernel=128M@32M and see if it helps.

I will continue the analysis in the meantime but I don't thing that
raising the value higher will help as 40Mb is well within the 128Mb
limit.

And a good blogpost on the usage of crashkernel is now a definite
requirement.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1567539

Title:
  Failure to dump with trusty+3.16 on ppc64el

Status in makedumpfile package in Ubuntu:
  New

Bug description:
  Test case
  # sudo apt-get install linux-crashdump

  set USE_KDUMP=1 in /etc/default/kdump-tools

  # sudo shutdown -r now

  echo 1 | sudo tee /proc/sys/kernel/sysrq
  echo c | sudo tee /proc/sysrq-trigger

  It looks like there was insufficient memory devoted to the crash
  kernel.  The defaults were used, and the kernel had 256G of ram, and
  only 2.6G were in use at the time of inducing the crash.

  ________________________Console log ________________________

  [  290.509423] SysRq : Trigger a crash
  [  290.509526] Unable to handle kernel paging request for data at address 0x00000000
  [  290.509606] Faulting instruction address: 0xc0000000005d9c94
  [  290.509672] Oops: Kernel access of bad area, sig: 11 [#1]
  [  290.509723] SMP NR_CPUS=2048 NUMA PowerNV
  [  290.509776] Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt rtc_generic i2c_opal powernv_rng uio_pdrv_genirq ipmi_powernv ipmi_msghandler uio mlx4_en vxlan ses enclosure mlx4_core ipr
  [  290.510178] CPU: 121 PID: 2976 Comm: tee Not tainted 3.16.0-69-generic #89~14.04.1-Ubuntu
  [  290.510254] task: c000001fdccf4a80 ti: c000001fdcd58000 task.ti: c000001fdcd58000
  [  290.510330] NIP: c0000000005d9c94 LR: c0000000005dad0c CTR: c0000000005d9c60
  [  290.510406] REGS: c000001fdcd5b9d0 TRAP: 0300   Not tainted  (3.16.0-69-generic)
  [  290.510480] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004024  XER: 20000000
  [  290.510671] CFAR: c000000000009368 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
  GPR00: c0000000005dad0c c000001fdcd5bc50 c0000000013d7d00 0000000000000063
  GPR04: c000000006548540 c000000006558da8 0000000000016fa0 c000000001596218
  GPR08: c000000000e37d00 0000000000000000 0000000000000001 0000000000016fa0
  GPR12: c0000000005d9c60 c000000007bc4100 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 00000000100094d8 00000000100094c0
  GPR24: 0000000000000001 00000000100094d8 0000000000000004 0000000000000000
  GPR28: c00000000130f6c0 0000000000000063 c0000000012ed7a0 c00000000130fa80
  [  290.511676] NIP [c0000000005d9c94] sysrq_handle_crash+0x34/0x50
  [  290.511742] LR [c0000000005dad0c] __handle_sysrq+0xec/0x280
  [  290.511793] Call Trace:
  [  290.511820] [c000001fdcd5bc50] [c000001fdcd5bcb0] 0xc000001fdcd5bcb0 (unreliable)
  [  290.511911] [c000001fdcd5bc70] [c0000000005dad0c] __handle_sysrq+0xec/0x280
  [  290.511988] [c000001fdcd5bd10] [c0000000005db4dc] write_sysrq_trigger+0x7c/0xa0
  [  290.512078] [c000001fdcd5bd40] [c00000000032e1d0] proc_reg_write+0xb0/0x110
  [  290.512155] [c000001fdcd5bd90] [c0000000002a47ec] vfs_write+0xdc/0x260
  [  290.512231] [c000001fdcd5bde0] [c0000000002a558c] SyS_write+0x6c/0x110
  [  290.512308] [c000001fdcd5be30] [c00000000000a1d8] system_call+0x38/0xd0
  [  290.512383] Instruction dump:
  [  290.512421] 3842e0a0 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d42001b 392a006c
  [  290.512546] 39400001 91490000 7c0004ac 39200000 <99490000> 38210020 e8010010 7c0803a6
  [  290.512674] ---[ end trace ba4afa55b8a163cd ]---
  [  290.512735]
  [  290.513754] Sending IPI to other CPUs
  [  290.514897] IPI complete
  [    0.000000] OPAL V3 detected !
  [    0.000000] Using PowerNV machine description
  [    0.000000] Page sizes from device-tree:
  [    0.000000] base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
  [    0.000000] base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
  [    0.000000] base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
  [    0.000000] base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
  [    0.000000] base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
  [    0.000000] base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
  [    0.000000] base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
  [    0.000000] Using 1TB segments
  [    0.000000] kvm_cma: CMA: failed to reserve 128 MiB
  [    0.000000] Found initrd at 0xc000000009730000:0xc00000000afd9bd6
  [    0.000000] bootconsole [udbg0] enabled
  [    0.000000] CPU maps initialized for 8 threads per core
   -> smp_release_cpus()
  spinning_secondaries = 159
   <- smp_release_cpus()
  [    0.000000] Starting Linux PPC64 #89~14.04.1-Ubuntu SMP Thu Mar 17 20:50:51 UTC 2016
  [    0.000000] -----------------------------------------------------
  [    0.000000] ppc64_pft_size                = 0x0
  [    0.000000] physicalMemorySize            = 0x19c00000
  [    0.000000] htab_address                  = 0xc00000000f800000
  [    0.000000] htab_hash_mask                = 0xfff
  [    0.000000] physical_start                = 0x8000000
  [    0.000000] -----------------------------------------------------
   <- setup_system()
  [    0.000000] Initializing cgroup subsys cpuset
  [    0.000000] Initializing cgroup subsys cpu
  [    0.000000] Initializing cgroup subsys cpuacct
  [    0.000000] Linux version 3.16.0-69-generic (buildd@bos01-ppc64el-017) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #89~14.04.1-Ubuntu SMP Thu Mar 17 20:50:51 UTC 2016 (Ubuntu 3.16.0-69.89~14.04.1-generic 3.16.7-ckt25)
  [    0.000000] [boot]0012 Setup Arch
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40000000
  [    0.000000] PCI host bridge /pciex@3fffe40000000 (primary) ranges:
  [    0.000000]  MEM 0x00003fe000000000..0x00003fe07ffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x800)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40100000
  [    0.000000] PCI host bridge /pciex@3fffe40100000  ranges:
  [    0.000000]  MEM 0x00003fe080000000..0x00003fe0fffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x1000)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40400000
  [    0.000000] PCI host bridge /pciex@3fffe40400000  ranges:
  [    0.000000]  MEM 0x00003fe200000000..0x00003fe27ffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x2800)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40500000
  [    0.000000] PCI host bridge /pciex@3fffe40500000  ranges:
  [    0.000000]  MEM 0x00003fe280000000..0x00003fe2fffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x3000)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe42000000
  [    0.000000] PCI host bridge /pciex@3fffe42000000  ranges:
  [    0.000000]  MEM 0x00003ff000000000..0x00003ff07ffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x20800)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe42400000
  [    0.000000] PCI host bridge /pciex@3fffe42400000  ranges:
  [    0.000000]  MEM 0x00003ff200000000..0x00003ff27ffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x22800)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe42500000
  [    0.000000] PCI host bridge /pciex@3fffe42500000  ranges:
  [    0.000000]  MEM 0x00003ff280000000..0x00003ff2fffeffff -> 0x0000000080000000
  [    0.000000]   256 (255) PE's M32: 0x80000000 [segment=0x800000]
  [    0.000000]                  M64: 0x10000000000 [segment=0x100000000]
  [    0.000000]   Allocated bitmap for 2040 MSIs (base IRQ 0x23000)
  [    0.000000]   Issue PHB reset ...
  [    0.000000] OPAL nvram setup, 1048576 bytes
  [    0.000000] Zone ranges:
  [    0.000000]   DMA      [mem 0x00000000-0x39bfffff]
  [    0.000000]   Normal   empty
  [    0.000000] Movable zone start for each node
  [    0.000000] Early memory node ranges
  [    0.000000]   node   0: [mem 0x00000000-0x0fffffff]
  [    0.000000]   node   0: [mem 0x30000000-0x39bfffff]
  [    0.000000] Could not find start_pfn for node 1
  [    0.000000] Could not find start_pfn for node 16
  [    0.000000] Could not find start_pfn for node 17
  [    0.000000] [boot]0015 Setup Done
  [    0.000000] bootmem alloc of 41943040 bytes failed!
  [    0.000000] Kernel panic - not syncing: Out of memory
  [    0.000000] CPU: 41 PID: 0 Comm: swapper Not tainted 3.16.0-69-generic #89~14.04.1-Ubuntu
  [    0.000000] Call Trace:
  [    0.000000] [c0000000093d7b50] [c000000008017340] show_stack+0x170/0x290 (unreliable)
  [    0.000000] [c0000000093d7c30] [c0000000089eedd8] dump_stack+0xc4/0x120
  [    0.000000] [c0000000093d7c70] [c0000000089e5e1c] panic+0x104/0x2b8
  [    0.000000] [c0000000093d7d00] [c000000008d7fd58] ___alloc_bootmem_node+0x4c/0x64
  [    0.000000] [c0000000093d7d70] [c000000008d5ac60] pcpu_fc_alloc+0x50/0x64
  [    0.000000] [c0000000093d7d90] [c000000008d7db04] pcpu_embed_first_chunk+0x5e8/0x874
  [    0.000000] [c0000000093d7e80] [c000000008d5b754] setup_per_cpu_areas+0x60/0x140
  [    0.000000] [c0000000093d7f00] [c000000008d53aa0] start_kernel+0x174/0x53c
  [    0.000000] [c0000000093d7f90] [c000000008009b6c] start_here_common+0x20/0xa8
  [    0.000000] ---[ end Kernel panic - not syncing: Out of memory

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1567539/+subscriptions


Follow ups

References