group.of.nepali.translators team mailing list archive

Thread
Date
[Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

To: group.of.nepali.translators@xxxxxxxxxxxxxxxxxxx
From: Daniel Axtens <daniel.axtens@xxxxxxxxxxxxx>
Date: Mon, 21 Aug 2017 01:01:35 -0000
Reply-to: Bug 1687512 <1687512@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: linux (Ubuntu)
       Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  SRU Justification
  -----------------

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---------------

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
  [24334.501611] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops: 0000 [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  [24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: ffff8803ee67c000
  [24334.601799] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:ffff8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 0000000000000000
  [24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 0000000000000000
  [24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 0000000000000000
  [24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 0000000000000000
  [24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: ffff8803ee672178
  [24334.652512] FS: 0000000000000000(0000) GS:ffff8803ffd00000(0000) knlGS:0000000000000000
  [24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 00000000001406e0
  [24334.673851] Stack:
  [24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 ffff880036529800
  [24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 ffff8803ffd16e70
  [24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 0000000000000000
  [24334.700172] Call Trace:
  [24334.702750] [<ffffffff810b98a6>] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [<ffffffff818043c4>] __schedule+0x7f4/0x980
  [24334.714349] [<ffffffff81804585>] schedule+0x35/0x80
  [24334.719445] [<ffffffff8180481e>] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [<ffffffff810bf9fa>] cpu_startup_entry+0x18a/0x350
  [24334.732012] [<ffffffff8104f3d9>] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [24334.771473] RSP <ffff8803ee67fdd8>
  [24334.775077] CR2: 0000000000000050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
  [155852.036931] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops: 0000 [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu
  [155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  [155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: ffff8800bbb10000
  [155852.135347] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [155852.144120] RSP: 0018:ffff8800bbb13ce0 EFLAGS: 00010086
  [155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: ffff8800bb777400
  [155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 0000000000000000
  [155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: ffff8803ed29aa00
  [155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 0000000000000000
  [155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 0000000000000001
  [155852.186387] FS: 00007f387d1c9700(0000) GS:ffff8803ffc80000(0000) knlGS:0000000000000000
  [155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 00000000001406e0
  [155852.207967] Stack:
  [155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 0000000000000000
  [155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 ffff8803ffc96e70
  [155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 0000000000000001
  [155852.234506] Call Trace:
  [155852.237156] [<ffffffff810369c9>] ? sched_clock+0x9/0x10
  [155852.242673] [<ffffffff810b9b65>] pick_next_task_fair+0x325/0x4b0
  [155852.248968] [<ffffffff81803cd9>] __schedule+0x109/0x980
  [155852.254491] [<ffffffff81804585>] schedule+0x35/0x80
  [155852.259667] [<ffffffff8180727c>] schedule_hrtimeout_range_clock+0xac/0x130
  [155852.266838] [<ffffffff810e9fb0>] ? hrtimer_init+0x180/0x180
  [155852.272712] [<ffffffff81807270>] ? schedule_hrtimeout_range_clock+0xa0/0x130
  [155852.280052] [<ffffffff81807313>] schedule_hrtimeout_range+0x13/0x20
  [155852.288558] [<ffffffff812479b9>] ep_poll+0x249/0x310
  [155852.293817] [<ffffffff810a8c30>] ? wake_up_q+0x80/0x80
  [155852.299271] [<ffffffff81248efc>] SyS_epoll_wait+0xbc/0xe0
  [155852.304967] [<ffffffff81807df6>] entry_SYSCALL_64_fastpath+0x16/0x75
  [155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [155852.338852] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [155852.345270] RSP <ffff8800bbb13ce0>
  [155852.348958] CR2: 0000000000000050
  [155852.353086] ---[ end trace 8ce693b2314611c4 ]---

  Similar issues have been reported in the community for kernels based
  on 4.4: https://github.com/kubernetes/kops/issues/874

  These panics occur in the CFS code when a next buddy is set on an
  entity that is not on a run-queue.  This causes pick_next_entity to
  end up with curr == left == NULL, which means it will call into
  wakeup_preempt_entity() with a valid next buddy and a NULL left, which
  it will try to dereference, causing a panic.

  This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to
  catch when a sched_entity in the hierarchy was not on_rq, as per
  https://marc.info/?l=linux-kernel&m=146651668921468&w=2

  The stack-trace for the WARN is quite involved:

  Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here ]------------
  Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at /build/linux-PwPelj/linux-4.4.0/kernel/sched/fair.c:5170 set_next_buddy+0x55/0x70()
  Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs dm_crypt lockd grace sunrpc fscache ppdev input_leds serio_raw parport_pc 8250_fintek parport pvpanic mac_hid i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: executor Not tainted 4.4.0-72-generic #93+hf135461v20170420b2-Ubuntu
  Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Apr 25 14:14:48 (none) kernel: [ 5339.764647]  0000000000000086 00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3
  Apr 25 14:14:48 (none) kernel: [ 5339.764650]  0000000000000000 ffffffff81cbae20 ffff8803ed947640 ffffffff81081302
  Apr 25 14:14:48 (none) kernel: [ 5339.764652]  ffff8800bb5fc800 ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400
  Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace:
  Apr 25 14:14:48 (none) kernel: [ 5339.764665]  [<ffffffff813f83c3>] dump_stack+0x63/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764669]  [<ffffffff81081302>] warn_slowpath_common+0x82/0xc0
  Apr 25 14:14:48 (none) kernel: [ 5339.764672]  [<ffffffff8108144a>] warn_slowpath_null+0x1a/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764674]  [<ffffffff810b52b5>] set_next_buddy+0x55/0x70
  Apr 25 14:14:48 (none) kernel: [ 5339.764676]  [<ffffffff810b59a4>] check_preempt_wakeup+0x244/0x250
  Apr 25 14:14:48 (none) kernel: [ 5339.764679]  [<ffffffff810ab580>] check_preempt_curr+0x80/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764682]  [<ffffffff810b42eb>] attach_task+0x4b/0x60
  Apr 25 14:14:48 (none) kernel: [ 5339.764685]  [<ffffffff810be067>] load_balance+0x5b7/0x980
  Apr 25 14:14:48 (none) kernel: [ 5339.764688]  [<ffffffff810be6e1>] pick_next_task_fair+0x2b1/0x4f0
  Apr 25 14:14:48 (none) kernel: [ 5339.764692]  [<ffffffff81837c5f>] __schedule+0x15f/0xa30
  Apr 25 14:14:48 (none) kernel: [ 5339.764694]  [<ffffffff81838565>] schedule+0x35/0x80
  Apr 25 14:14:48 (none) kernel: [ 5339.764697]  [<ffffffff8183ba85>] schedule_hrtimeout_range_clock+0xc5/0x1b0
  Apr 25 14:14:48 (none) kernel: [ 5339.764700]  [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764703]  [<ffffffff8183ba79>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0
  Apr 25 14:14:48 (none) kernel: [ 5339.764705]  [<ffffffff8183bb83>] schedule_hrtimeout_range+0x13/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764709]  [<ffffffff81223914>] poll_schedule_timeout+0x44/0x70
  Apr 25 14:14:48 (none) kernel: [ 5339.764711]  [<ffffffff81224407>] do_select+0x727/0x810
  Apr 25 14:14:48 (none) kernel: [ 5339.764715]  [<ffffffff811fb932>] ? page_counter_uncharge+0x22/0x40
  Apr 25 14:14:48 (none) kernel: [ 5339.764718]  [<ffffffff811fdb1c>] ? drain_stock.isra.33+0x6c/0xa0
  Apr 25 14:14:48 (none) kernel: [ 5339.764720]  [<ffffffff810b5349>] ? update_curr+0x79/0x160
  Apr 25 14:14:48 (none) kernel: [ 5339.764722]  [<ffffffff810b550c>] ? update_cfs_shares+0xbc/0x100
  Apr 25 14:14:48 (none) kernel: [ 5339.764724]  [<ffffffff810b742b>] ? dequeue_entity+0x41b/0xa80
  Apr 25 14:14:48 (none) kernel: [ 5339.764729]  [<ffffffff810719f7>] ? gup_pud_range+0x127/0x220
  Apr 25 14:14:48 (none) kernel: [ 5339.764731]  [<ffffffff810baa9c>] ? set_next_entity+0x9c/0xb0
  Apr 25 14:14:48 (none) kernel: [ 5339.764736]  [<ffffffff8102d66c>] ? __switch_to+0x1dc/0x5c0
  Apr 25 14:14:48 (none) kernel: [ 5339.764740]  [<ffffffff81401304>] ? timerqueue_del+0x24/0x70
  Apr 25 14:14:48 (none) kernel: [ 5339.764742]  [<ffffffff810efa3c>] ? __remove_hrtimer+0x3c/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764744]  [<ffffffff810efb61>] ? hrtimer_try_to_cancel+0xd1/0x130
  Apr 25 14:14:48 (none) kernel: [ 5339.764746]  [<ffffffff810efbd9>] ? hrtimer_cancel+0x19/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764751]  [<ffffffff81101166>] ? futex_wait+0x206/0x280
  Apr 25 14:14:48 (none) kernel: [ 5339.764753]  [<ffffffff810ab5a9>] ? ttwu_do_wakeup+0x19/0xe0
  Apr 25 14:14:48 (none) kernel: [ 5339.764756]  [<ffffffff812246bf>] core_sys_select+0x1cf/0x2f0
  Apr 25 14:14:48 (none) kernel: [ 5339.764758]  [<ffffffff810ef880>] ? __hrtimer_init+0x90/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764762]  [<ffffffff81128447>] ? audit_filter_rules+0x217/0xe30
  Apr 25 14:14:48 (none) kernel: [ 5339.764764]  [<ffffffff81103860>] ? do_futex+0x120/0x540
  Apr 25 14:14:48 (none) kernel: [ 5339.764768]  [<ffffffff8106428e>] ? kvm_clock_get_cycles+0x1e/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764772]  [<ffffffff810f53aa>] ? ktime_get_ts64+0x4a/0xf0
  Apr 25 14:14:48 (none) kernel: [ 5339.764774]  [<ffffffff8122489a>] SyS_select+0xba/0x110
  Apr 25 14:14:48 (none) kernel: [ 5339.764777]  [<ffffffff8183c672>] entry_SYSCALL_64_fastpath+0x16/0x71
  Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 ]---

  Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash. They appear to be intended as a series.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1687512/+subscriptions