kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #171072
[Bug 1567539] Re: Failure to dump with trusty+3.16 on ppc64el
Hello,
Regarding nr_cpus=1, the equivalent maxcpus=1 is set in the kexec
command (at least on default installs) :
$ kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x2b000000
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.4.0-17-generic
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-17-generic
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-4.4.0-17-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7 irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
^^^^^^^^^^^
Maybe disabling SMP alltogether by setting maxcpus=0 could be considered
but that shouldn't change much aside from not reserving any SMP data
structure. To Be Tested.
Regarding kvm_cma_resv_ratio=0, this will avoid the error message but
it has on bearing on the current situation. It failed to allocated it so
the memory was not in use.
256Gb of RAM doesn't mean that 128Mb needs to be increased. Here is the
output of free on a 128Gb system right after a kernel panic :
(initramfs) chroot /root free -h
total used free shared buff/cache available
Mem: 99M 20M 21M 48K 57M 62M
Swap: 0B 0B 0B
And here is the memory allocation in the same context :
(initramfs) chroot /root cat /proc/meminfo
MemTotal: 102000 kB
MemFree: 22260 kB
MemAvailable: 63700 kB
Buffers: 1640 kB
Cached: 33944 kB
SwapCached: 0 kB
Active: 12400 kB
Inactive: 23584 kB
Active(anon): 416 kB
Inactive(anon): 28 kB
Active(file): 11984 kB
Inactive(file): 23556 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 408 kB
Mapped: 3128 kB
Shmem: 48 kB
Slab: 23228 kB
SReclaimable: 10568 kB
SUnreclaim: 12660 kB
KernelStack: 1952 kB
PageTables: 96 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 51000 kB
Committed_AS: 1212 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 281500 kB
VmallocChunk: 34358935548 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 27048 kB
DirectMap2M: 104448 kB
DirectMap1G: 0 kB
The 128Mb is used to allocate kernel data structure, load the initrd in
RAM. Then more memory is needed for makedumpfile to read, convert and
compress /proc/vmcore into a file. Since makedumpfile 1.5.5 it uses a
memory footprint that is rather stable and minimally increase depending
on the size of the RAM.
In this case, makedumpfile has not even started to execute so it is also
safe to exclude that as the source of the problem.
This in return indicates a problem :
[ 0.000000] bootmem alloc of 41943040 bytes failed!
[ 0.000000] Kernel panic - not syncing: Out of memory
This is very early on boot and it fails to allocate 40Mb of memory for
bootmem. My suspicion is that it is unable to allocate that memory at
the start of the memory.
The only information I could gather by a quick google search is this :
https://www.novell.com/support/kb/doc.php?id=3374462
For SLES, they suggest to allocate the memory starting at 32M so you
might want to replace the crashkernel value by :
crashkernel=128M@32M and see if it helps.
I will continue the analysis in the meantime but I don't thing that
raising the value higher will help as 40Mb is well within the 128Mb
limit.
And a good blogpost on the usage of crashkernel is now a definite
requirement.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1567539
Title:
Failure to dump with trusty+3.16 on ppc64el
Status in makedumpfile package in Ubuntu:
New
Bug description:
Test case
# sudo apt-get install linux-crashdump
set USE_KDUMP=1 in /etc/default/kdump-tools
# sudo shutdown -r now
echo 1 | sudo tee /proc/sys/kernel/sysrq
echo c | sudo tee /proc/sysrq-trigger
It looks like there was insufficient memory devoted to the crash
kernel. The defaults were used, and the kernel had 256G of ram, and
only 2.6G were in use at the time of inducing the crash.
________________________Console log ________________________
[ 290.509423] SysRq : Trigger a crash
[ 290.509526] Unable to handle kernel paging request for data at address 0x00000000
[ 290.509606] Faulting instruction address: 0xc0000000005d9c94
[ 290.509672] Oops: Kernel access of bad area, sig: 11 [#1]
[ 290.509723] SMP NR_CPUS=2048 NUMA PowerNV
[ 290.509776] Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_crypt rtc_generic i2c_opal powernv_rng uio_pdrv_genirq ipmi_powernv ipmi_msghandler uio mlx4_en vxlan ses enclosure mlx4_core ipr
[ 290.510178] CPU: 121 PID: 2976 Comm: tee Not tainted 3.16.0-69-generic #89~14.04.1-Ubuntu
[ 290.510254] task: c000001fdccf4a80 ti: c000001fdcd58000 task.ti: c000001fdcd58000
[ 290.510330] NIP: c0000000005d9c94 LR: c0000000005dad0c CTR: c0000000005d9c60
[ 290.510406] REGS: c000001fdcd5b9d0 TRAP: 0300 Not tainted (3.16.0-69-generic)
[ 290.510480] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004024 XER: 20000000
[ 290.510671] CFAR: c000000000009368 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000005dad0c c000001fdcd5bc50 c0000000013d7d00 0000000000000063
GPR04: c000000006548540 c000000006558da8 0000000000016fa0 c000000001596218
GPR08: c000000000e37d00 0000000000000000 0000000000000001 0000000000016fa0
GPR12: c0000000005d9c60 c000000007bc4100 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 00000000100094d8 00000000100094c0
GPR24: 0000000000000001 00000000100094d8 0000000000000004 0000000000000000
GPR28: c00000000130f6c0 0000000000000063 c0000000012ed7a0 c00000000130fa80
[ 290.511676] NIP [c0000000005d9c94] sysrq_handle_crash+0x34/0x50
[ 290.511742] LR [c0000000005dad0c] __handle_sysrq+0xec/0x280
[ 290.511793] Call Trace:
[ 290.511820] [c000001fdcd5bc50] [c000001fdcd5bcb0] 0xc000001fdcd5bcb0 (unreliable)
[ 290.511911] [c000001fdcd5bc70] [c0000000005dad0c] __handle_sysrq+0xec/0x280
[ 290.511988] [c000001fdcd5bd10] [c0000000005db4dc] write_sysrq_trigger+0x7c/0xa0
[ 290.512078] [c000001fdcd5bd40] [c00000000032e1d0] proc_reg_write+0xb0/0x110
[ 290.512155] [c000001fdcd5bd90] [c0000000002a47ec] vfs_write+0xdc/0x260
[ 290.512231] [c000001fdcd5bde0] [c0000000002a558c] SyS_write+0x6c/0x110
[ 290.512308] [c000001fdcd5be30] [c00000000000a1d8] system_call+0x38/0xd0
[ 290.512383] Instruction dump:
[ 290.512421] 3842e0a0 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d42001b 392a006c
[ 290.512546] 39400001 91490000 7c0004ac 39200000 <99490000> 38210020 e8010010 7c0803a6
[ 290.512674] ---[ end trace ba4afa55b8a163cd ]---
[ 290.512735]
[ 290.513754] Sending IPI to other CPUs
[ 290.514897] IPI complete
[ 0.000000] OPAL V3 detected !
[ 0.000000] Using PowerNV machine description
[ 0.000000] Page sizes from device-tree:
[ 0.000000] base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
[ 0.000000] base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
[ 0.000000] base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
[ 0.000000] base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
[ 0.000000] base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
[ 0.000000] base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
[ 0.000000] base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
[ 0.000000] Using 1TB segments
[ 0.000000] kvm_cma: CMA: failed to reserve 128 MiB
[ 0.000000] Found initrd at 0xc000000009730000:0xc00000000afd9bd6
[ 0.000000] bootconsole [udbg0] enabled
[ 0.000000] CPU maps initialized for 8 threads per core
-> smp_release_cpus()
spinning_secondaries = 159
<- smp_release_cpus()
[ 0.000000] Starting Linux PPC64 #89~14.04.1-Ubuntu SMP Thu Mar 17 20:50:51 UTC 2016
[ 0.000000] -----------------------------------------------------
[ 0.000000] ppc64_pft_size = 0x0
[ 0.000000] physicalMemorySize = 0x19c00000
[ 0.000000] htab_address = 0xc00000000f800000
[ 0.000000] htab_hash_mask = 0xfff
[ 0.000000] physical_start = 0x8000000
[ 0.000000] -----------------------------------------------------
<- setup_system()
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 3.16.0-69-generic (buildd@bos01-ppc64el-017) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #89~14.04.1-Ubuntu SMP Thu Mar 17 20:50:51 UTC 2016 (Ubuntu 3.16.0-69.89~14.04.1-generic 3.16.7-ckt25)
[ 0.000000] [boot]0012 Setup Arch
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40000000
[ 0.000000] PCI host bridge /pciex@3fffe40000000 (primary) ranges:
[ 0.000000] MEM 0x00003fe000000000..0x00003fe07ffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x800)
[ 0.000000] Issue PHB reset ...
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40100000
[ 0.000000] PCI host bridge /pciex@3fffe40100000 ranges:
[ 0.000000] MEM 0x00003fe080000000..0x00003fe0fffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x1000)
[ 0.000000] Issue PHB reset ...
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40400000
[ 0.000000] PCI host bridge /pciex@3fffe40400000 ranges:
[ 0.000000] MEM 0x00003fe200000000..0x00003fe27ffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x2800)
[ 0.000000] Issue PHB reset ...
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe40500000
[ 0.000000] PCI host bridge /pciex@3fffe40500000 ranges:
[ 0.000000] MEM 0x00003fe280000000..0x00003fe2fffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x3000)
[ 0.000000] Issue PHB reset ...
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe42000000
[ 0.000000] PCI host bridge /pciex@3fffe42000000 ranges:
[ 0.000000] MEM 0x00003ff000000000..0x00003ff07ffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x20800)
[ 0.000000] Issue PHB reset ...
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe42400000
[ 0.000000] PCI host bridge /pciex@3fffe42400000 ranges:
[ 0.000000] MEM 0x00003ff200000000..0x00003ff27ffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x22800)
[ 0.000000] Issue PHB reset ...
[ 0.000000] Initializing IODA2 OPAL PHB /pciex@3fffe42500000
[ 0.000000] PCI host bridge /pciex@3fffe42500000 ranges:
[ 0.000000] MEM 0x00003ff280000000..0x00003ff2fffeffff -> 0x0000000080000000
[ 0.000000] 256 (255) PE's M32: 0x80000000 [segment=0x800000]
[ 0.000000] M64: 0x10000000000 [segment=0x100000000]
[ 0.000000] Allocated bitmap for 2040 MSIs (base IRQ 0x23000)
[ 0.000000] Issue PHB reset ...
[ 0.000000] OPAL nvram setup, 1048576 bytes
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00000000-0x39bfffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00000000-0x0fffffff]
[ 0.000000] node 0: [mem 0x30000000-0x39bfffff]
[ 0.000000] Could not find start_pfn for node 1
[ 0.000000] Could not find start_pfn for node 16
[ 0.000000] Could not find start_pfn for node 17
[ 0.000000] [boot]0015 Setup Done
[ 0.000000] bootmem alloc of 41943040 bytes failed!
[ 0.000000] Kernel panic - not syncing: Out of memory
[ 0.000000] CPU: 41 PID: 0 Comm: swapper Not tainted 3.16.0-69-generic #89~14.04.1-Ubuntu
[ 0.000000] Call Trace:
[ 0.000000] [c0000000093d7b50] [c000000008017340] show_stack+0x170/0x290 (unreliable)
[ 0.000000] [c0000000093d7c30] [c0000000089eedd8] dump_stack+0xc4/0x120
[ 0.000000] [c0000000093d7c70] [c0000000089e5e1c] panic+0x104/0x2b8
[ 0.000000] [c0000000093d7d00] [c000000008d7fd58] ___alloc_bootmem_node+0x4c/0x64
[ 0.000000] [c0000000093d7d70] [c000000008d5ac60] pcpu_fc_alloc+0x50/0x64
[ 0.000000] [c0000000093d7d90] [c000000008d7db04] pcpu_embed_first_chunk+0x5e8/0x874
[ 0.000000] [c0000000093d7e80] [c000000008d5b754] setup_per_cpu_areas+0x60/0x140
[ 0.000000] [c0000000093d7f00] [c000000008d53aa0] start_kernel+0x174/0x53c
[ 0.000000] [c0000000093d7f90] [c000000008009b6c] start_here_common+0x20/0xa8
[ 0.000000] ---[ end Kernel panic - not syncing: Out of memory
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1567539/+subscriptions
Follow ups
References