kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #156050
[Bug 1534413] Re: Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED
Because Precise 3.2.0-79 is missing debug symbols froms ddebs.ubuntu.com I had to
compile a 3.2.0-79 kernel in a PPA and expect the symbols to be close to what they
used to be in that version.
That led me to a wrong initial analysis that I document here for
historical purposes:
> 178 2 18 ffff881f716bdc00 RU 0.0 0 0 [khungtaskd]
> 3680 2808 38 ffff881f71a5c500 RU 1.6 6629520 4303188 java
> 50279 49370 31 ffff883f0c7e8000 RU 0.0 4121160 111120 java
> 50757 50322 23 ffff881ef27eae00 RU 0.3 4149720 870892 java
crash> bt ffff881ef27eae00
PID: 50757 TASK: ffff881ef27eae00 CPU: 23 COMMAND: "java"
#0 [ffff881fbfba6ee0] crash_nmi_callback at ffffffff81031ac9
#1 [ffff881fbfba6ef0] default_do_nmi at ffffffff81666079
#2 [ffff881fbfba6f30] do_nmi at ffffffff816662b0
#3 [ffff881fbfba6f50] nmi at ffffffff81665620
[exception RIP: next_tgid+40]
RIP: ffffffff811df248 RSP: ffff881cd4b67da8 RFLAGS: 00000202
RAX: 0000000000000000 RBX: ffffffff81c281a0 RCX: 0000000000000000
RDX: ffff881f72830000 RSI: 0000000000000074 RDI: ffffffff81c281a0
RBP: ffff881cd4b67df8 R8: 000000000000a88d R9: 0000000000000004
R10: ffff883f70922540 R11: 0001f8f579768213 R12: 0000000000000074
R13: ffff881f00000073 R14: ffffffff81c281a0 R15: ffff881f73138000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <DOUBLEFAULT exception stack> ---
#4 [ffff881cd4b67da8] next_tgid at ffffffff811df248
#5 [ffff881cd4b67e00] proc_pid_readdir at ffffffff811e117e
#6 [ffff881cd4b67eb0] proc_root_readdir at ffffffff811dbe0a
#7 [ffff881cd4b67ee0] vfs_readdir at ffffffff8118e1d0
#8 [ffff881cd4b67f30] sys_getdents at ffffffff8118e4a9
#9 [ffff881cd4b67f80] system_call_fastpath at ffffffff8166d2c2
RIP: 00007f855e94d605 RSP: 00007f853a457bf0 RFLAGS: 00000283
RAX: 000000000000004e RBX: ffffffff8166d2c2 RCX: 0000000000000010
RDX: 0000000000008000 RSI: 00007f8548057980 RDI: 00000000000000fb
RBP: 00007f854846b140 R8: 00007f8548057980 R9: 0000000000000008
R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000016
R13: ffffffffffffffa0 R14: 00007f853a45a4d0 R15: 00007f8548057950
ORIG_RAX: 000000000000004e CS: 0033 SS: 002b
1) khungtaskd is complaining about a hung task
2) task ffff881ef27eae00 is in "next_tgid" from procfs vfs subsystem:
We are probably stuck here:
rcu_read_lock();
retry:
iter.task = NULL;
pid = find_ge_pid(iter.tgid, ns);
if (pid) {
iter.tgid = pid_nr_ns(pid, ns);
iter.task = pid_task(pid, PIDTYPE_PID);
/* What we to know is if the pid we have find is the
* pid of a thread_group_leader. Testing for task
* being a thread_group_leader is the obvious thing
* todo but there is a window when it fails, due to
* the pid transfer logic in de_thread.
*
* So we perform the straight forward test of seeing
* if the pid we have found is the pid of a thread
* group leader, and don't worry if the task we have
* found doesn't happen to be a thread group leader.
* As we don't care in the case of readdir.
*/
if (!iter.task || !has_group_leader_pid(iter.task)) {
iter.tgid += 1;
goto retry;
}
get_task_struct(iter.task);
}
rcu_read_unlock();
Trying to find a task group leader when reading procfs structure (by the
JVM process).
I'm still analysing code.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1534413
Title:
Precise: lockup during fadvise syscall with POSIX_FADV_DONTNEED
Status in linux package in Ubuntu:
In Progress
Bug description:
It was brought to my knowledge a kernel dump (3.2.0-79) with the
following stack trace:
"""
[14277.952072] usb 3-1: USB disconnect, device number 3
[15388.602790] NOHZ: local_softirq_pending 02
[15404.593795] NOHZ: local_softirq_pending 02
[15436.575787] NOHZ: local_softirq_pending 02
[15452.566802] NOHZ: local_softirq_pending 02
[15456.564528] NOHZ: local_softirq_pending 02
[15564.503842] NOHZ: local_softirq_pending 02
[15584.492538] NOHZ: local_softirq_pending 02
[15588.490302] NOHZ: local_softirq_pending 02
[15632.465563] NOHZ: local_softirq_pending 02
[15659.014629] NOHZ: local_softirq_pending 02
[15956.371298] INFO: task jsvc:57263 blocked for more than 120 seconds.
[15956.375347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15956.383487] jsvc D ffffffff81806200 0 57263 9495 0x00000000
[15956.383493] ffff883e849c3d08 0000000000000082 ffff883e849c3ca8 000000008104f624
[15956.383502] ffff883e849c3fd8 ffff883e849c3fd8 ffff883e849c3fd8 0000000000012800
[15956.383509] ffff881f72db9700 ffff883e0c4e2e00 ffff883e849c3cf8 7fffffffffffffff
[15956.383518] Call Trace:
[15956.383534] [<ffffffff81662c0f>] schedule+0x3f/0x60
[15956.383539] [<ffffffff8166324d>] schedule_timeout+0x29d/0x310
[15956.383547] [<ffffffff810388be>] ? physflat_send_IPI_mask+0xe/0x10
[15956.383554] [<ffffffff81032068>] ? native_smp_send_reschedule+0x48/0x60
[15956.383560] [<ffffffff8103ec29>] ? default_spin_lock_flags+0x9/0x10
[15956.383564] [<ffffffff81662a4f>] wait_for_common+0xdf/0x180
[15956.383572] [<ffffffff81060ac0>] ? try_to_wake_up+0x200/0x200
[15956.383576] [<ffffffff81662bcd>] wait_for_completion+0x1d/0x20
[15956.383585] [<ffffffff8108757e>] flush_work+0x2e/0x40
[15956.383589] [<ffffffff810838b0>] ? wake_up_worker+0x30/0x30
[15956.383593] [<ffffffff81087813>] schedule_on_each_cpu+0xc3/0x110
[15956.383602] [<ffffffff81127365>] lru_add_drain_all+0x15/0x20
[15956.383607] [<ffffffff8111e189>] sys_fadvise64_64+0x189/0x270
[15956.383610] [<ffffffff8111e27e>] sys_fadvise64+0xe/0x10
[15956.383619] [<ffffffff8166d2c2>] system_call_fastpath+0x16/0x1b
[15956.383622] Kernel panic - not syncing: hung_task: blocked tasks
[15956.388083] Pid: 178, comm: khungtaskd Tainted: G W 3.2.0-79-generic #115-Ubuntu
[15956.397273] Call Trace:
[15956.401783] [<ffffffff8164c005>] panic+0x91/0x1a4
[15956.406527] [<ffffffff810d97c2>] check_hung_task+0xb2/0xc0
[15956.411393] [<ffffffff810d98eb>] check_hung_uninterruptible_tasks+0x11b/0x140
[15956.421117] [<ffffffff810d9910>] ? check_hung_uninterruptible_tasks+0x140/0x140
[15956.431847] [<ffffffff810d995f>] watchdog+0x4f/0x60
[15956.437524] [<ffffffff8108b99c>] kthread+0x8c/0xa0
[15956.443145] [<ffffffff8166f434>] kernel_thread_helper+0x4/0x10
[15956.448830] [<ffffffff8108b910>] ? flush_kthread_worker+0xa0/0xa0
[15956.454700] [<ffffffff8166f430>] ? gs_change+0x13/0x13
"""
Analysis being made on the comments...
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534413/+subscriptions
References