kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #18915
[Bug 1233175] Re: Kernel panic : mempolicy potential use-after-free on server running mongodb
Here is an analysis of the kernel core dump captured for this issue :
crash> sys
KERNEL: vmlinux-3.2.0-38-generic
DUMPFILE: VmCore
CPUS: 24
DATE: Wed Sep 18 22:34:35 2013
UPTIME: 1 days, 11:33:14
LOAD AVERAGE: 2.04, 2.09, 2.16
TASKS: 6656
NODENAME: ddb-mongo41
RELEASE: 3.2.0-38-generic
VERSION: #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013
MACHINE: x86_64 (2533 Mhz)
MEMORY: 47.9 GB
PANIC: "[127932.907100] kernel BUG at /build/buildd/linux-3.2.0/mm/mempolicy.c:1638!"
PID: 21767
COMMAND: "mongod"
TASK: ffff8800874bdc00 [THREAD_INFO: ffff880544d8a000]
CPU: 12
STATE: EXIT_DEAD (PANIC)
Analysis
========
This is the backtrace of the panic task :
crash> bt
PID: 21767 TASK: ffff8800874bdc00 CPU: 12 COMMAND: "mongod"
#0 [ffff880657cc3820] machine_kexec at ffffffff810393da
#1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8
#2 [ffff880657cc3960] oops_end at ffffffff8165e528
#3 [ffff880657cc3990] die at ffffffff810178d8
#4 [ffff880657cc39c0] do_trap at ffffffff8165de94
#5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65
#6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b
[exception RIP: slab_node+0x2e]
RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010
R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
#8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
#9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
#11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14
#12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic]
#13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic]
#14 [ffff880657cc3e58] net_rx_action at ffffffff81545274
#15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8
#16 [ffff880657cc3f38] call_softirq at ffffffff81667bec
#17 [ffff880657cc3f50] do_softirq at ffffffff81016305
#18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de
#19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3
--- <IRQ stack> ---
#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
[exception RIP: __slab_free+0x2e1]
RIP: ffffffff81649467 RSP: ffff880544d8bdf8 RFLAGS: 00000202
RAX: 0000000000000001 RBX: ffffffffff0a0210 RCX: 0000000180aa00a9
RDX: 0000000180aa00aa RSI: ffffea002afc6201 RDI: ffff880657806200
RBP: ffff880544d8bea8 R8: 0000000000000001 R9: 0000000000000000
R10: ffff8800874be020 R11: ffff8800874be030 R12: ffff880544d8be33
R13: 000000000000000d R14: ffffffff81191895 R15: ffff880544d8bdb8
ORIG_RAX: ffffffffffffff54 CS: 0010 SS: 0018
#21 [ffff880544d8be30] __change_pid at ffffffff81087dca
#22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634
#23 [ffff880544d8bef0] __mpol_put at ffffffff81159937
#24 [ffff880544d8bf00] do_exit at ffffffff8106c75c
#25 [ffff880544d8bf70] sys_exit at ffffffff8106caf7
#26 [ffff880544d8bf80] system_call_fastpath at ffffffff81665982
RIP: 00007f6f476b8f37 RSP: 00007f68cbcfdbb0 RFLAGS: 00000202
RAX: 000000000000003c RBX: ffffffff81665982 RCX: ffffffffffffffff
RDX: 00007f68cbcfe700 RSI: 00007f6f478c9250 RDI: 0000000000000000
RBP: 0000000000000000 R8: 00007f68cbcfe700 R9: 00007f68e82a0370
R10: 000000007fffffff R11: 0000000000000246 R12: ffffffff8106caf7
R13: ffff880544d8bf78 R14: 0000000000000003 R15: 00007f68f8744a10
ORIG_RAX: 000000000000003c CS: 0033 SS: 002b
And the register information :
[127932.906804] ------------[ cut here ]------------
[127932.907100]
[127932.907502] invalid opcode: 0000 [#1] SMP
[127932.907774] CPU 12
[127932.907901] Modules linked in: vesafb joydev iomemory_vsl(P) i7core_edac ioatdma edac_core dca tpm_tis acpi_power_meter mac_hid lp parport
usbhid hid usb_storage mptsas enic mptscsih mptbase scsi_transport_sas
[127932.909246]
[127932.909340] Pid: 21767, comm: mongod Tainted: P O 3.2.0-38-generic #61-Ubuntu Cisco Systems Inc UCSC-BSE-SFF-C200/UCSC-BSE-SFF-C
200
[127932.910150] RIP: 0010:[<ffffffff8115a66e>] [<ffffffff8115a66e>] slab_node+0x2e/0xa0
[127932.910614] RSP: 0018:ffff880657cc3b70 EFLAGS: 00010097
[127932.910921] RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
[127932.911344] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
[127932.911763] RBP: ffff880657cc3b80 R08: 0000000000000001 R09: 0000000100100010
[127932.912184] R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
[127932.912625] R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
[127932.913126] FS: 0000000000000000(0000) GS:ffff880657cc0000(0000) knlGS:0000000000000000
[127932.913621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[127932.913986] CR2: 00007fb8e6d01218 CR3: 00000006468d2000 CR4: 00000000000006e0
[127932.936937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[127932.960466] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[127932.984295] Process mongod (pid: 21767, threadinfo ffff880544d8a000, task ffff8800874bdc00)
[127933.008400] Stack:
[127933.019992] ffffffff8188ee40 ffff880657cd6720 ffff880657cc3c10 ffffffff816496a0
[127933.043734] ffff880657801e80 ffff8800874bdc00 ffff8800874bdc00 0000000077f9e84e
[127933.068114] ffff880657cd6700 0000000000000008 ffff880657cc3bf0 ffff8800874bdc00
[127933.092225] Call Trace:
[127933.103955] <IRQ>
The current task is actually in the process of exiting :
sys_exit -> do_exit
The execution of __mpol_put() is in the process of destructing the mempolicy structure of the current task
:
#ifdef CONFIG_NUMA
task_lock(tsk);
mpol_put(tsk->mempolicy);
tsk->mempolicy = NULL;
task_unlock(tsk);
#endif
crash> task | grep mempolicy
mempolicy = 0xffff880abf18a288,
crash> * mempolicy 0xffff880abf18a288
struct mempolicy {
refcnt = {
counter = 0x0
},
mode = 0x0,
flags = 0x0,
v = {
preferred_node = 0x3,
nodes = {
bits = {0x3}
}
},
w = {
cpuset_mems_allowed = {
bits = {0x3}
},
user_nodemask = {
bits = {0x3}
}
}
}
/* Slow path of a mpol destructor. */
void __mpol_put(struct mempolicy *p)
{
if (!atomic_dec_and_test(&p->refcnt))
return;
kmem_cache_free(policy_cache, p);
}
The p->refcnt has already been decreased and kmem_cache_free is being
performed to release the mempolicy element :
The IRQ is triggered while __slab_free() is being executed :
#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
[exception RIP: __slab_free+0x2e1]
The slab where the eement is being freed is :
RDI: ffff880657806200
crash> kmem -s 0xffff880abf18a288
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff880657806200 numa_policy 24 41373 43180 254 4k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea002afc6280 ffff880abf18a000 1 169 169 0
FREE / [ALLOCATED]
[ffff880abf18a288]
which is indeed the slab where the mempolicy is
__slab_free+0x2e1 is at the very end of __slab_free() :
Looking at the RIP where __slab_free should return, we can see that
__slab_free was about to return when the IRQ came in :
crash> dis -l __slab_free
....
0xffffffff81649457 <__slab_free+0x2d1>: call 0xffffffff8165d590 <_raw_spin_unlock_irqrestore>
/build/buildd/linux-3.2.0/mm/slub.c: 2517
0xffffffff8164945c <__slab_free+0x2d6>: mov rsi,r12
0xffffffff8164945f <__slab_free+0x2d9>: mov rdi,r13
0xffffffff81649462 <__slab_free+0x2dc>: call 0xffffffff81162f90 <discard_slab>
/build/buildd/linux-3.2.0/mm/slub.c: 2518
0xffffffff81649467 <__slab_free+0x2e1>: add rsp,0x88
spin_unlock_irqrestore(&n->list_lock, flags);
stat(s, FREE_SLAB);
discard_slab(s, page);
}
So most probably the mempolicy page has now been freed.
The IRQ processing now happens and goes through many functions until it
needs to allocate a new slab element for an skb (network buffer) :
[exception RIP: slab_node+0x2e]
RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010
R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
#8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
#9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
__slab_alloc (from slab_alloc) gets called with a kmem_cache structure
retrieved from get_slab()
crash> bt -F
#7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
ffff880657cc3b90: [kmem_cache_node] [task_struct]
ffff880657cc3ba0: [task_struct] 0000000077f9e84e
ffff880657cc3bb0: ffff880657cd6700 0000000000000008
ffff880657cc3bc0: ffff880657cc3bf0 [task_struct]
ffff880657cc3bd0: [kmalloc-2048] [kmalloc-8192]
ffff880657cc3be0: 0000000000000008 ffff880657cd6700
ffff880657cc3bf0: 0000000000000000 ffff880657802c00
ffff880657cc3c00: 00000000ffffffff 0000000000000282
ffff880657cc3c10: ffff880657cc3cc0 __slab_alloc+0x14b
#8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
crash> bt -f
#7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
ffff880657cc3b90: ffff880657801e80 ffff8800874bdc00
ffff880657cc3ba0: ffff8800874bdc00 0000000077f9e84e
ffff880657cc3bb0: ffff880657cd6700 0000000000000008
ffff880657cc3bc0: ffff880657cc3bf0 ffff8800874bdc00
ffff880657cc3bd0: ffff880877f9e84e ffff880c462e0000
ffff880657cc3be0: 0000000000000008 ffff880657cd6700
ffff880657cc3bf0: 0000000000000000 ffff880657802c00
ffff880657cc3c00: 00000000ffffffff 0000000000000282
ffff880657cc3c10: ffff880657cc3cc0 ffffffff816498cf
ffff880657cc3b90: [kmem_cache_node] = ffff880657801e80
But while executing get_any_partial which is only NUMA specific #IFDEF code,
it does the following :
zonelist = node_zonelist(slab_node(current->mempolicy), flags);
current being the current task pointer, it tries does the slab_node() on
the mempolicy that was recently freed prior to the IRQ hitting after
discard_slab
This mempolicy has now been freed, which most probably explains why the
value of mode is 0x0 :
crash> * mempolicy.mode ffff880abf18a288
mode = 0x0
The behavior for slab_node when mempolicy.mode = 0x0 MPOL_DEFAULT is :
default:
BUG();
which triggers the panic.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1233175
Title:
Kernel panic : mempolicy potential use-after-free on server running
mongodb
Status in “linux” package in Ubuntu:
Triaged
Status in “linux” source package in Precise:
Triaged
Bug description:
PID: 21767 TASK: ffff8800874bdc00 CPU: 12 COMMAND: "mongod"
#0 [ffff880657cc3820] machine_kexec at ffffffff810393da
#1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8
#2 [ffff880657cc3960] oops_end at ffffffff8165e528
#3 [ffff880657cc3990] die at ffffffff810178d8
#4 [ffff880657cc39c0] do_trap at ffffffff8165de94
#5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65
#6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b
[exception RIP: slab_node+46]
RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010
R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
#8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
#9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
#11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14
#12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic]
#13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic]
#14 [ffff880657cc3e58] net_rx_action at ffffffff81545274
#15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8
#16 [ffff880657cc3f38] call_softirq at ffffffff81667bec
#17 [ffff880657cc3f50] do_softirq at ffffffff81016305
#18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de
#19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3
--- <IRQ stack> ---
#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
[exception RIP: __slab_free+737]
RIP: ffffffff81649467 RSP: ffff880544d8bdf8 RFLAGS: 00000202
RAX: 0000000000000001 RBX: ffffffffff0a0210 RCX: 0000000180aa00a9
RDX: 0000000180aa00aa RSI: ffffea002afc6201 RDI: ffff880657806200
RBP: ffff880544d8bea8 R8: 0000000000000001 R9: 0000000000000000
R10: ffff8800874be020 R11: ffff8800874be030 R12: ffff880544d8be33
R13: 000000000000000d R14: ffffffff81191895 R15: ffff880544d8bdb8
ORIG_RAX: ffffffffffffff54 CS: 0010 SS: 0018
#21 [ffff880544d8be30] __change_pid at ffffffff81087dca
#22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634
#23 [ffff880544d8bef0] __mpol_put at ffffffff81159937
#24 [ffff880544d8bf00] do_exit at ffffffff8106c75c
#25 [ffff880544d8bf70] sys_exit at ffffffff8106caf7
#26 [ffff880544d8bf80] system_call_fastpath at ffffffff81665982
RIP: 00007f6f476b8f37 RSP: 00007f68cbcfdbb0 RFLAGS: 00000202
RAX: 000000000000003c RBX: ffffffff81665982 RCX: ffffffffffffffff
RDX: 00007f68cbcfe700 RSI: 00007f6f478c9250 RDI: 0000000000000000
RBP: 0000000000000000 R8: 00007f68cbcfe700 R9: 00007f68e82a0370
R10: 000000007fffffff R11: 0000000000000246 R12: ffffffff8106caf7
R13: ffff880544d8bf78 R14: 0000000000000003 R15: 00007f68f8744a10
ORIG_RAX: 000000000000...
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1233175/+subscriptions
References