← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1747896] Re: OOM and High CPU utilization in update_blocked_averages because of too many cfs_rqs in rq->leaf_cfs_rq_list

 

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1747896

Title:
  OOM and High CPU utilization in update_blocked_averages because of too
  many cfs_rqs in rq->leaf_cfs_rq_list

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Xenial:
  New

Bug description:
  [Impact]
  The CPU utilization keeps high and the flamegraph[1] shows that the CPU
  is busy updating the load average in the for loop inside
  update_blocked_averages() function. Also, the OOM happens because of
  the decayed cfs_rqs are not released.

  [Fix]
  commit a9e7f6544b9cebdae54d29f87a7ba2a83c0471b5
  Author: Tejun Heo <tj@xxxxxxxxxx>
  Date:   Tue Apr 25 17:43:50 2017 -0700

  sched/fair: Fix O(nr_cgroups) in load balance path
      
  Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
  live cfs_rqs which have ever been active on the CPU; unfortunately,
  this makes update_blocked_averages() O(# total cgroups) which isn't
  scalable at all.
      
  This shows up as a small CPU consumption and scheduling latency
  increase in the load balancing path in systems with CPU controller
  enabled across most cgroups.  In an edge case where temporary cgroups
  were leaking, this caused the kernel to consume good several tens of
  percents of CPU cycles running update_blocked_averages(), each run
  taking multiple millisecs.
      
  This patch fixes the issue by taking empty and fully decayed cfs_rqs
  off the rq->leaf_cfs_rq_list. 

  [Test]
  1). Running the script
  #/bin/bash

  for i in $(seq 1 10); do
          ( for j in $(seq 1 3000); do ssh -S none u@localhost date;done; echo "done $i" ) &
  done    

  2). Observe the cfs_rqs
  $ watch -n1 "grep cfs_rq /proc/sched_debug| wc -l"

  3). Observe the CPU utilization rate
  $ sudo htop

  The patched kernel[2] shows that the CPU utilization rate is normal, the
  cfs_rqs is decreased periodically, and the memory can be limited.

  [Reference]
  [1]. http://kernel.ubuntu.com/~gavinguo/168887/2018-01-31_07-38-45.perf.data.svg
  [2]. https://launchpad.net/~mimi0213kimo/+archive/ubuntu/cfs-rq-clean

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747896/+subscriptions