group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #15686
[Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
** Changed in: linux (Ubuntu)
Status: Triaged => Fix Released
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1687512
Title:
Kernel panics on Xenial when using cgroups and strict CFS limits
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Xenial:
Fix Released
Bug description:
SRU Justification
-----------------
[Impact]
Apache Mesos and Kubernetes workloads on Xenial cause a panic
(NULL pointer dereference) in the completely fair scheduler.
These panics are in pick_next_entity and include pick_next_task_fair
in the call stack.
[Fix]
Cherry-picking both
754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
(http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
and
094f469172e00d6ab0a3130b0e01c83b3cf3a98d
(http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
fix the crash.
They appear to be intended as a series - they were posted to LKML at
the same time.
[Testcase]
The fix has been validated by the user who reported the bug
Bug description
---------------
We see a number of kernel panics on servers running Apache Mesos using
cgroups with small (0.1-0.2) cpu limits.
These all appear as NULL pointer dereferences in and around
pick_next_entity and pick_next_task_fair, for example:
[24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[24334.501611] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
[24334.512806] Oops: 0000 [#1] SMP
[24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
[24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
[24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: ffff8803ee67c000
[24334.601799] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[24334.610490] RSP: 0018:ffff8803ee67fdd8 EFLAGS: 00010086
[24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 0000000000000000
[24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 0000000000000000
[24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 0000000000000000
[24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 0000000000000000
[24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: ffff8803ee672178
[24334.652512] FS: 0000000000000000(0000) GS:ffff8803ffd00000(0000) knlGS:0000000000000000
[24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 00000000001406e0
[24334.673851] Stack:
[24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 ffff880036529800
[24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 ffff8803ffd16e70
[24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 0000000000000000
[24334.700172] Call Trace:
[24334.702750] [<ffffffff810b98a6>] pick_next_task_fair+0x66/0x4b0
[24334.708886] [<ffffffff818043c4>] __schedule+0x7f4/0x980
[24334.714349] [<ffffffff81804585>] schedule+0x35/0x80
[24334.719445] [<ffffffff8180481e>] schedule_preempt_disabled+0xe/0x10
[24334.725962] [<ffffffff810bf9fa>] cpu_startup_entry+0x18a/0x350
[24334.732012] [<ffffffff8104f3d9>] start_secondary+0x149/0x170
[24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
[24334.765124] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[24334.771473] RSP <ffff8803ee67fdd8>
[24334.775077] CR2: 0000000000000050
[24334.779121] ---[ end trace 05d941efb97b7bae ]---
and
[155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[155852.036931] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
[155852.048550] Oops: 0000 [#1] SMP
[155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
[155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
[155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: ffff8800bbb10000
[155852.135347] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[155852.144120] RSP: 0018:ffff8800bbb13ce0 EFLAGS: 00010086
[155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: ffff8800bb777400
[155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 0000000000000000
[155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: ffff8803ed29aa00
[155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 0000000000000000
[155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 0000000000000001
[155852.186387] FS: 00007f387d1c9700(0000) GS:ffff8803ffc80000(0000) knlGS:0000000000000000
[155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 00000000001406e0
[155852.207967] Stack:
[155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 0000000000000000
[155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 ffff8803ffc96e70
[155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 0000000000000001
[155852.234506] Call Trace:
[155852.237156] [<ffffffff810369c9>] ? sched_clock+0x9/0x10
[155852.242673] [<ffffffff810b9b65>] pick_next_task_fair+0x325/0x4b0
[155852.248968] [<ffffffff81803cd9>] __schedule+0x109/0x980
[155852.254491] [<ffffffff81804585>] schedule+0x35/0x80
[155852.259667] [<ffffffff8180727c>] schedule_hrtimeout_range_clock+0xac/0x130
[155852.266838] [<ffffffff810e9fb0>] ? hrtimer_init+0x180/0x180
[155852.272712] [<ffffffff81807270>] ? schedule_hrtimeout_range_clock+0xa0/0x130
[155852.280052] [<ffffffff81807313>] schedule_hrtimeout_range+0x13/0x20
[155852.288558] [<ffffffff812479b9>] ep_poll+0x249/0x310
[155852.293817] [<ffffffff810a8c30>] ? wake_up_q+0x80/0x80
[155852.299271] [<ffffffff81248efc>] SyS_epoll_wait+0xbc/0xe0
[155852.304967] [<ffffffff81807df6>] entry_SYSCALL_64_fastpath+0x16/0x75
[155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
[155852.338852] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[155852.345270] RSP <ffff8800bbb13ce0>
[155852.348958] CR2: 0000000000000050
[155852.353086] ---[ end trace 8ce693b2314611c4 ]---
Similar issues have been reported in the community for kernels based
on 4.4: https://github.com/kubernetes/kops/issues/874
These panics occur in the CFS code when a next buddy is set on an
entity that is not on a run-queue. This causes pick_next_entity to
end up with curr == left == NULL, which means it will call into
wakeup_preempt_entity() with a valid next buddy and a NULL left, which
it will try to dereference, causing a panic.
This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to
catch when a sched_entity in the hierarchy was not on_rq, as per
https://marc.info/?l=linux-kernel&m=146651668921468&w=2
The stack-trace for the WARN is quite involved:
Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here ]------------
Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at /build/linux-PwPelj/linux-4.4.0/kernel/sched/fair.c:5170 set_next_buddy+0x55/0x70()
Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs dm_crypt lockd grace sunrpc fscache ppdev input_leds serio_raw parport_pc 8250_fintek parport pvpanic mac_hid i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: executor Not tainted 4.4.0-72-generic #93+hf135461v20170420b2-Ubuntu
Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Apr 25 14:14:48 (none) kernel: [ 5339.764647] 0000000000000086 00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3
Apr 25 14:14:48 (none) kernel: [ 5339.764650] 0000000000000000 ffffffff81cbae20 ffff8803ed947640 ffffffff81081302
Apr 25 14:14:48 (none) kernel: [ 5339.764652] ffff8800bb5fc800 ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400
Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace:
Apr 25 14:14:48 (none) kernel: [ 5339.764665] [<ffffffff813f83c3>] dump_stack+0x63/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764669] [<ffffffff81081302>] warn_slowpath_common+0x82/0xc0
Apr 25 14:14:48 (none) kernel: [ 5339.764672] [<ffffffff8108144a>] warn_slowpath_null+0x1a/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764674] [<ffffffff810b52b5>] set_next_buddy+0x55/0x70
Apr 25 14:14:48 (none) kernel: [ 5339.764676] [<ffffffff810b59a4>] check_preempt_wakeup+0x244/0x250
Apr 25 14:14:48 (none) kernel: [ 5339.764679] [<ffffffff810ab580>] check_preempt_curr+0x80/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764682] [<ffffffff810b42eb>] attach_task+0x4b/0x60
Apr 25 14:14:48 (none) kernel: [ 5339.764685] [<ffffffff810be067>] load_balance+0x5b7/0x980
Apr 25 14:14:48 (none) kernel: [ 5339.764688] [<ffffffff810be6e1>] pick_next_task_fair+0x2b1/0x4f0
Apr 25 14:14:48 (none) kernel: [ 5339.764692] [<ffffffff81837c5f>] __schedule+0x15f/0xa30
Apr 25 14:14:48 (none) kernel: [ 5339.764694] [<ffffffff81838565>] schedule+0x35/0x80
Apr 25 14:14:48 (none) kernel: [ 5339.764697] [<ffffffff8183ba85>] schedule_hrtimeout_range_clock+0xc5/0x1b0
Apr 25 14:14:48 (none) kernel: [ 5339.764700] [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764703] [<ffffffff8183ba79>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0
Apr 25 14:14:48 (none) kernel: [ 5339.764705] [<ffffffff8183bb83>] schedule_hrtimeout_range+0x13/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764709] [<ffffffff81223914>] poll_schedule_timeout+0x44/0x70
Apr 25 14:14:48 (none) kernel: [ 5339.764711] [<ffffffff81224407>] do_select+0x727/0x810
Apr 25 14:14:48 (none) kernel: [ 5339.764715] [<ffffffff811fb932>] ? page_counter_uncharge+0x22/0x40
Apr 25 14:14:48 (none) kernel: [ 5339.764718] [<ffffffff811fdb1c>] ? drain_stock.isra.33+0x6c/0xa0
Apr 25 14:14:48 (none) kernel: [ 5339.764720] [<ffffffff810b5349>] ? update_curr+0x79/0x160
Apr 25 14:14:48 (none) kernel: [ 5339.764722] [<ffffffff810b550c>] ? update_cfs_shares+0xbc/0x100
Apr 25 14:14:48 (none) kernel: [ 5339.764724] [<ffffffff810b742b>] ? dequeue_entity+0x41b/0xa80
Apr 25 14:14:48 (none) kernel: [ 5339.764729] [<ffffffff810719f7>] ? gup_pud_range+0x127/0x220
Apr 25 14:14:48 (none) kernel: [ 5339.764731] [<ffffffff810baa9c>] ? set_next_entity+0x9c/0xb0
Apr 25 14:14:48 (none) kernel: [ 5339.764736] [<ffffffff8102d66c>] ? __switch_to+0x1dc/0x5c0
Apr 25 14:14:48 (none) kernel: [ 5339.764740] [<ffffffff81401304>] ? timerqueue_del+0x24/0x70
Apr 25 14:14:48 (none) kernel: [ 5339.764742] [<ffffffff810efa3c>] ? __remove_hrtimer+0x3c/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764744] [<ffffffff810efb61>] ? hrtimer_try_to_cancel+0xd1/0x130
Apr 25 14:14:48 (none) kernel: [ 5339.764746] [<ffffffff810efbd9>] ? hrtimer_cancel+0x19/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764751] [<ffffffff81101166>] ? futex_wait+0x206/0x280
Apr 25 14:14:48 (none) kernel: [ 5339.764753] [<ffffffff810ab5a9>] ? ttwu_do_wakeup+0x19/0xe0
Apr 25 14:14:48 (none) kernel: [ 5339.764756] [<ffffffff812246bf>] core_sys_select+0x1cf/0x2f0
Apr 25 14:14:48 (none) kernel: [ 5339.764758] [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764762] [<ffffffff81128447>] ? audit_filter_rules+0x217/0xe30
Apr 25 14:14:48 (none) kernel: [ 5339.764764] [<ffffffff81103860>] ? do_futex+0x120/0x540
Apr 25 14:14:48 (none) kernel: [ 5339.764768] [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764772] [<ffffffff810f53aa>] ? ktime_get_ts64+0x4a/0xf0
Apr 25 14:14:48 (none) kernel: [ 5339.764774] [<ffffffff8122489a>] SyS_select+0xba/0x110
Apr 25 14:14:48 (none) kernel: [ 5339.764777] [<ffffffff8183c672>] entry_SYSCALL_64_fastpath+0x16/0x71
Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 ]---
Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
(http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d
(http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
fix the crash. They appear to be intended as a series.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1687512/+subscriptions