kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #125947
Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault
On Mon, Jul 6, 2015 at 9:28 PM, Colin Ian King
<1469214@xxxxxxxxxxxxxxxxxx> wrote:
> I re-ran this today with the following script as a non-root user:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"
>
> for t in $tests
> do
> echo $t
> echo $t | sudo tee /dev/kmsg
> ./stress-ng --$t 0 -v -t 60
> done
>
> and hit this issue:
>
> [14098.848615] urandom
> [14111.696335] irqbalance[828]: unhandled level 2 translation fault (11) at 0x00004f64, esr 0x92000006
> [14111.696341] pgd = ffffffcfef71b000
> [14111.737149] [00004f64] *pgd=0000004fef1f3003, *pud=0000004fef1f3003, *pmd=0000000000000000
>
As I suggested, it should be helpful to provide /proc/$(pidof
irqbalance)/maps, otherwise we can't know where both the faulted
and PC address are.
Finally I have figured out one simple way to reproduce the issue:
1) apply the attached debug patch to stress-ng
2) run the following script:
sudo cat /proc/$(pidof irqbalance)/maps
/home/ubuntu/git/stress-ng/stress-ng --sequential 0 --seq-start 80
--seq-end 84 -t 60 --syslog --metrics --times -v
And the above command just runs the following 4 stresses in 4 minutes:
stress-ng: info: [1067] dispatching hogs: 8 tsearch, 8 udp, 8 udp-flood,
8 urandom
3) the above may trigger the following faults from irqbalance with
~3/4 probability, and the faulted address is in heap, and PC points to
code of libglib-2.0.so, so looks like a use-after-free in irqbalance or
libglib? And no information shows it is related with kernel, also
the four stresses are quite simple and shouldn't cause trouble to
kernel.
# irqbalance memory maps
00400000-0040a000 r-xp 00000000 08:02 10496929
/usr/sbin/irqbalance
00419000-0041a000 r-xp 00009000 08:02 10496929
/usr/sbin/irqbalance
0041a000-0041b000 rwxp 0000a000 08:02 10496929
/usr/sbin/irqbalance
16294000-162b5000 rwxp 00000000 00:00 0 [heap]
162b5000-162ce000 rwxp 00000000 00:00 0 [heap]
7f8fbf9000-7f8fbfb000 rwxp 00000000 00:00 0
7f8fbfb000-7f8fc11000 r-xp 00000000 08:02 4722034
/lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc11000-7f8fc20000 ---p 00016000 08:02 4722034
/lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc20000-7f8fc21000 r-xp 00015000 08:02 4722034
/lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc21000-7f8fc22000 rwxp 00016000 08:02 4722034
/lib/aarch64-linux-gnu/libpthread-2.21.so
7f8fc22000-7f8fc26000 rwxp 00000000 00:00 0
7f8fc26000-7f8fc7f000 r-xp 00000000 08:02 4718668
/lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc7f000-7f8fc8f000 ---p 00059000 08:02 4718668
/lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc8f000-7f8fc90000 r-xp 00059000 08:02 4718668
/lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc90000-7f8fc91000 rwxp 0005a000 08:02 4718668
/lib/aarch64-linux-gnu/libpcre.so.3.13.1
7f8fc91000-7f8fdc1000 r-xp 00000000 08:02 4722027
/lib/aarch64-linux-gnu/libc-2.21.so
7f8fdc1000-7f8fdd0000 ---p 00130000 08:02 4722027
/lib/aarch64-linux-gnu/libc-2.21.so
7f8fdd0000-7f8fdd4000 r-xp 0012f000 08:02 4722027
/lib/aarch64-linux-gnu/libc-2.21.so
7f8fdd4000-7f8fdd6000 rwxp 00133000 08:02 4722027
/lib/aarch64-linux-gnu/libc-2.21.so
7f8fdd6000-7f8fdda000 rwxp 00000000 00:00 0
7f8fdda000-7f8fde3000 r-xp 00000000 08:02 10885206
/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fde3000-7f8fdf2000 ---p 00009000 08:02 10885206
/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fdf2000-7f8fdf3000 r-xp 00008000 08:02 10885206
/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fdf3000-7f8fdf4000 rwxp 00009000 08:02 10885206
/usr/lib/aarch64-linux-gnu/libnuma.so.1.0.0
7f8fdf4000-7f8fdf8000 rwxp 00000000 00:00 0
7f8fdf8000-7f8fe89000 r-xp 00000000 08:02 4722041
/lib/aarch64-linux-gnu/libm-2.21.so
7f8fe89000-7f8fe98000 ---p 00091000 08:02 4722041
/lib/aarch64-linux-gnu/libm-2.21.so
7f8fe98000-7f8fe99000 r-xp 00090000 08:02 4722041
/lib/aarch64-linux-gnu/libm-2.21.so
7f8fe99000-7f8fe9a000 rwxp 00091000 08:02 4722041
/lib/aarch64-linux-gnu/libm-2.21.so
7f8fe9a000-7f8ff8c000 r-xp 00000000 08:02 4718610
/lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff8c000-7f8ff9c000 ---p 000f2000 08:02 4718610
/lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff9c000-7f8ff9d000 r-xp 000f2000 08:02 4718610
/lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff9d000-7f8ff9e000 rwxp 000f3000 08:02 4718610
/lib/aarch64-linux-gnu/libglib-2.0.so.0.4400.1
7f8ff9e000-7f8ff9f000 rwxp 00000000 00:00 0
7f8ff9f000-7f8ffa3000 r-xp 00000000 08:02 10879730
/usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffa3000-7f8ffb2000 ---p 00004000 08:02 10879730
/usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffb2000-7f8ffb3000 r-xp 00003000 08:02 10879730
/usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffb3000-7f8ffb4000 rwxp 00004000 08:02 10879730
/usr/lib/aarch64-linux-gnu/libcap-ng.so.0.0.0
7f8ffb4000-7f8ffd0000 r-xp 00000000 08:02 4722030
/lib/aarch64-linux-gnu/ld-2.21.so
7f8ffd0000-7f8ffd3000 rwxp 00000000 00:00 0
7f8ffdc000-7f8ffde000 rwxp 00000000 00:00 0
7f8ffde000-7f8ffdf000 r--p 00000000 00:00 0 [vvar]
7f8ffdf000-7f8ffe0000 r-xp 00000000 00:00 0 [vdso]
7f8ffe0000-7f8ffe1000 r-xp 0001c000 08:02 4722030
/lib/aarch64-linux-gnu/ld-2.21.so
7f8ffe1000-7f8ffe3000 rwxp 0001d000 08:02 4722030
/lib/aarch64-linux-gnu/ld-2.21.so
7fecdb1000-7fecdd2000 rw-p 00000000 00:00 0 [stack]
[ 250.276095] irqbalance[779]: unhandled level 2 translation fault
(11) at 0x00162a54, esr 0x92000006
[ 250.276103] pgd = ffffffc0ff812000
[ 250.316917] [00162a54] *pgd=00000040ffa6b003,
*pud=00000040ffa6b003, *pmd=0000000000000000
[ 250.416447] CPU: 5 PID: 779 Comm: irqbalance Not tainted
3.19.0-21-generic #21-Ubuntu
[ 250.416450] Hardware name: HP ProLiant m400 Server Cartridge (DT)
[ 250.416452] task: ffffffcfb46cc980 ti: ffffffc0feba0000 task.ti:
ffffffc0feba0000
[ 250.416464] PC is at 0x7f8ff02834
[ 250.416467] LR is at 0x7f8ff027f4
[ 250.416469] pc : [<0000007f8ff02834>] lr : [<0000007f8ff027f4>]
pstate: 80000000
[ 250.416471] sp : 0000007fecdd1480
[ 250.416472] x29: 0000007fecdd1480 x28: 000000000041a000
[ 250.416476] x27: 000000000041a000 x26: 00000000004094e0
[ 250.416478] x25: 0000000000000001 x24: 0000000000000010
[ 250.416481] x23: 00000000162948a0 x22: 0000000016294880
[ 250.416484] x21: 0000000000000018 x20: 0000007f8ff9e000
[ 250.416486] x19: 0000000000000002 x18: 0000000000000000
[ 250.416489] x17: 0000007f8fc088ec x16: 0000007f8ff9d2e0
[ 250.416491] x15: 0000000000000020 x14: 0000000000000000
[ 250.416494] x13: 0000000000000000 x12: 0000000000000000
[ 250.416496] x11: 0000007fecdceff0 x10: 0000000000000010
[ 250.416499] x9 : 00000000000000a0 x8 : 0000000000000007
[ 250.416501] x7 : 0000000000000033 x6 : 0000000016294c80
[ 250.416504] x5 : 0000000000000001 x4 : 0000007f8fc212a0
[ 250.416506] x3 : 0000000016294880 x2 : 0000000000000001
[ 250.416509] x1 : 00000000000003fa x0 : 0000000000162a4c
** Patch added: "0001-stress-ng-support-sequential-range.patch"
https://bugs.launchpad.net/bugs/1469214/+attachment/4425151/+files/0001-stress-ng-support-sequential-range.patch
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1469214
Title:
HP ProLiant m400 Server crashes with unhandled level 3 translation
fault
Status in linux package in Ubuntu:
Triaged
Bug description:
Running stress-ng on a HP ProLiant m400 server can cause unhandled
level 3 translations faults:
use stress-ng from git://kernel.ubuntu.com/cking/stress-ng
./stress-ng --seq 0 -t 60 -v
and after some time this trips the following:
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT)
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064
Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions
References