kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #93004
[Bug 1398497] Re: HP Proliant Serverrs - DL360 and DL380 Gen8 - Precise Kernel Panic - General Protection Fault
Analyzing logs...
We have TONS of stack traces similar to this:
Nov 27 19:06:49 sgsxeris001 kernel: [522969.113150] general protection fault: 0000 [#474] SMP
Nov 27 19:06:49 sgsxeris001 kernel: [522969.113341] CPU 35
Nov 27 19:06:49 sgsxeris001 kernel: [522969.115290]
Nov 27 19:06:49 sgsxeris001 kernel: [522969.115361] Pid: 63574, comm: make Tainted: G D 3.2.0-67-generic #101-Ubuntu HP ProLiant DL380p Gen8
Nov 27 19:06:49 sgsxeris001 kernel: [522969.115567] RIP: 0010:[<ffffffff8116616e>] [<ffffffff8116616e>] kmem_cache_alloc_trace+0x5e/0x140
...
Nov 27 19:06:49 sgsxeris001 kernel: [522969.116824] Stack:
...
Meaning that ALL processes that were scheduled on CPU 35 and executed
either:
RIP = kmem_cache_alloc_trace+0x5e/0x140 OR
RIP = __kmalloc+0x7b/0x190
(RIP = Instruction Pointer)
Caused the CPU to have a Protection Fault. Protection faults can lead
system to HANG in cause of double or triple faults to happen (the
second/third happen while the first one is being handled by Linux
exception handler).
inaddy@workstation:~/.../var/log$ cat syslog | egrep "RIP:" | wc -l
2632
2632 is the number of times a process caused a Protection Fault (all of
them on CPU 35) when scheduled to CPU 35.
Following these 2 Instruction Pointers... (from kmem_cache_alloc_trace
AND __kmalloc), both of them are in the same piece of code (and
instructions):
2325 if (unlikely(!irqsafe_cpu_cmpxchg_double(
0xffffffff81166576 <+86>: mov (%r12),%rsi
0xffffffff8116657e <+94>: mov 0x0(%r13,%rax,1),%rbx
0xffffffff81166583 <+99>: mov %r13,%rax
0xffffffff81166586 <+102>: callq 0xffffffff8131cb20
0xffffffff8116658b <+107>: data32 xchg %ax,%ax
0xffffffff8116658e <+110>: test %al,%al
0xffffffff81166590 <+112>: je 0xffffffff81166554 <kmem_cache_alloc_trace+52>
2325 if (unlikely(!irqsafe_cpu_cmpxchg_double(
0xffffffff81166113 <+115>: mov (%r12),%rsi
0xffffffff8116611b <+123>: mov 0x0(%r13,%rax,1),%rbx
0xffffffff81166120 <+128>: mov %r13,%rax
0xffffffff81166123 <+131>: callq 0xffffffff8131cb20
0xffffffff81166128 <+136>: data32 xchg %ax,%ax
0xffffffff8116612b <+139>: test %al,%al
0xffffffff8116612d <+141>: je 0xffffffff811660f1 <__kmalloc+81>
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1398497
Title:
HP Proliant Serverrs - DL360 and DL380 Gen8 - Precise Kernel Panic -
General Protection Fault
Status in linux package in Ubuntu:
Incomplete
Status in linux source package in Precise:
Incomplete
Bug description:
It was brought to my attention the following situation:
"""
We massively upgraded our Ubuntu 12.04 servers (most of them are HP
DL360p Gen8 or DL380 Gen8) to 3.2.0-67 kernel And in the last 2-3
days we already had to reboot 5 of them because they completely hang
Some of them had the following messages under syslog :
kernel: [384707.675479] general protection fault: 0000 [#5666] SMP
others had :
kernel: [950725.612724] BUG: unable to handle kernel paging request
All of them have this also :
your BIOS is broken and requested that x2apic be disabled
"""
Comments bellow
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1398497/+subscriptions
References