← Back to team overview

kernel-packages team mailing list archive

Re: [Bug 1469214] Re: HP ProLiant m400 Server crashes with unhandled level 3 translation fault

 

Hi Colin,

That looks one progress, but still takes time to reproduce that,
and I will use your new approach to reproduce that.

When you are doing that, could you dump the file of /proc/$(pidof
irqbalance)/maps so that we can see where the faulted address are
in the process's vm space?

thanks,


On Sat, Jul 4, 2015 at 4:10 AM, Colin Ian King
<1469214@xxxxxxxxxxxxxxxxxx> wrote:
> Running the following:
>
> #!/bin/bash
> tests="affinity aio bigheap brk bsearch cache chdir chmod clock context cpu crypt dentry dir dup epoll eventfd fstat fallocate fault fifo flock fork futex get getrandom hdd hsearch inotify io itimer kcmp kill lease link lockf longjmp lsearch malloc matrix memcpy memfd mincore mlock mmap mmapmany mremap msg mq nice null open pipe poll procfs pthread qsort readahead rename rlimit seek sem sem-sysv sendfile shm-sysv sigfd sigfpe sigq sigsegv sock splice stack str switch symlink sysinfo sysfs tee timer timerfd tsearch udp udp-flood urandom utime vecmath vfork vm vm-rw vm-splice wcs wait yield xattr zero zombie"
>
> for t in $tests
> do
>         echo $t
>         echo $t > /dev/kmsg
>         ./stress-ng --$t 0 -v -t 60
> done
>
> eventually tripped the translation fault in irqbalance.  I ran this
> after a clean reboot.
>
> [ 4901.799846] timerfd
> [ 4961.807050] tsearch
> [ 5021.884456] udp
> [ 5081.895058] udp-flood
> [ 5141.674365] irqbalance[827]: unhandled level 2 translation fault (11) at 0x002d6da4, esr 0x92000006
> [ 5141.674376] pgd = ffffffcfb51a0000
> [ 5141.715215] [002d6da4] *pgd=0000004fb677e003, *pud=0000004fb677e003, *pmd=0000000000000000
>
> [ 5141.816183] CPU: 0 PID: 827 Comm: irqbalance Not tainted 3.19.0-21-generic #21-Ubuntu
> [ 5141.816185] Hardware name: HP ProLiant m400 Server Cartridge (DT)
> [ 5141.816188] task: ffffffcfac088000 ti: ffffffcfab710000 task.ti: ffffffcfab710000
> [ 5141.816206] PC is at 0x7f88287834
> [ 5141.816208] LR is at 0x7f882877f4
> [ 5141.816210] pc : [<0000007f88287834>] lr : [<0000007f882877f4>] pstate: 80000000
> [ 5141.816212] sp : 0000007ff2e46b30
> [ 5141.816214] x29: 0000007ff2e46b30 x28: 00000000004095a0
> [ 5141.816217] x27: 0000000000409548 x26: 000000000041a000
> [ 5141.816220] x25: 0000000000000001 x24: 0000000000000010
> [ 5141.816222] x23: 000000002d6c98a0 x22: 000000002d6c9880
> [ 5141.816225] x21: 0000000000000018 x20: 0000007f88323000
> [ 5141.816228] x19: 0000000000000002 x18: 0000000000000000
> [ 5141.816230] x17: 0000007f87f8d8ec x16: 0000007f883222e0
> [ 5141.816233] x15: 0000000000000020 x14: 0000000000000001
> [ 5141.816235] x13: 0000000000000000 x12: 0000000000000000
> [ 5141.816237] x11: 0000007ff2e446a0 x10: 0000000000000010
> [ 5141.816240] x9 : 00000000000000a0 x8 : 0000000000000007
> [ 5141.816242] x7 : 0000000000000033 x6 : 000000002d6c9c80
> [ 5141.816245] x5 : 0000000000000001 x4 : 0000007f87fa62a0
> [ 5141.816247] x3 : 000000002d6c9880 x2 : 0000000000000001
> [ 5141.816250] x1 : 00000000000003fa x0 : 00000000002d6d9c
>
> [ 5141.907792] urandom
> [ 5201.928712] utime
> [ 5261.934534] vecmath
> [ 5321.940302] vfork
> [ 5381.947904] vm
> [ 5441.991784] vm-rw
> [ 5502.017614] vm-splice
> [ 5562.023334] wcs
> [ 5622.037054] wait
> [ 5682.043302] yield
> [ 5742.056595] xattr
> [ 5802.075772] zero
> [ 5862.087396] zombie
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1469214
>
> Title:
>   HP ProLiant m400 Server crashes with unhandled level 3 translation
>   fault
>
> Status in linux package in Ubuntu:
>   Triaged
>
> Bug description:
>   Running stress-ng on a HP ProLiant m400 server can cause unhandled
>   level 3 translations faults:
>
>   use stress-ng from git://kernel.ubuntu.com/cking/stress-ng
>
>   ./stress-ng --seq 0 -t 60 -v
>
>   and after some time this trips the following:
>
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT)
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064
>   Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1469214

Title:
  HP ProLiant m400 Server crashes with unhandled level 3 translation
  fault

Status in linux package in Ubuntu:
  Triaged

Bug description:
  Running stress-ng on a HP ProLiant m400 server can cause unhandled
  level 3 translations faults:

  use stress-ng from git://kernel.ubuntu.com/cking/stress-ng

  ./stress-ng --seq 0 -t 60 -v

  and after some time this trips the following:

  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922560] systemd-timesyn[481]: unhandled level 3 translation fault (7) at 0x7fa8ea6008, esr 0x92000007
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922561] pgd = ffffffcfb563f000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922563] [7fa8ea6008] *pgd=0000004fb4f28003, *pud=0000004fb4f28003, *pmd=0000004fb4f38003, *pte=000000001d151c00
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922566]
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922569] CPU: 6 PID: 481 Comm: systemd-timesyn Not tainted 3.19.0-21-generic #21-Ubuntu
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922571] Hardware name: HP ProLiant m400 Server Cartridge (DT)
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922573] task: ffffffcfb4e3b100 ti: ffffffcfb4d2c000 task.ti: ffffffcfb4d2c000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922588] PC is at 0x7fa8d81824
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922589] LR is at 0x7fa8e3b3e4
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922590] pc : [<0000007fa8d81824>] lr : [<0000007fa8e3b3e4>] pstate: 80000000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922591] sp : 0000007ff120d660
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922592] x29: 0000007ff120d660 x28: 0000007fa8f1c000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922594] x27: 0000007fa8f32084 x26: 0000007fa8f32000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922595] x25: 0000007fa8f1d788 x24: 0000007fa8f1d888
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922597] x23: 0000000000000001 x22: 0000007fa8f1faa0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922599] x21: 0000007ff120d7f0 x20: 0000007ff120d7d0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922600] x19: 0000007fa8f31000 x18: 0000007fa8f1e000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922602] x17: 0000007fa8e3b3b8 x16: 0000007fa8ea6000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922603] x15: 003b9aca00000000 x14: 00219bbdd0000000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922605] x13: ffffffffaa751223 x12: 0000000000000000
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922607] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922609] x9 : 37333c43484f5e46 x8 : 0000007ff120d818
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922610] x7 : 0000007ff120d8f0 x6 : 0000007ff120d828
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922612] x5 : ffffff80ffffffd0 x4 : 0000007ff120d8c0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922613] x3 : 0000007ff120d7d0 x2 : 0000007fa8f1faa0
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922615] x1 : 0000000000000001 x0 : 0000000000000064
  Jun 26 14:01:54 ms10-34-proliant kernel: [150297.922616]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1469214/+subscriptions


References