group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #21983
[Bug 1747896] Re: OOM and High CPU utilization in update_blocked_averages because of too many cfs_rqs in rq->leaf_cfs_rq_list
** Also affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1747896
Title:
OOM and High CPU utilization in update_blocked_averages because of too
many cfs_rqs in rq->leaf_cfs_rq_list
Status in linux package in Ubuntu:
Incomplete
Status in linux source package in Xenial:
New
Bug description:
[Impact]
The CPU utilization keeps high and the flamegraph[1] shows that the CPU
is busy updating the load average in the for loop inside
update_blocked_averages() function. Also, the OOM happens because of
the decayed cfs_rqs are not released.
[Fix]
commit a9e7f6544b9cebdae54d29f87a7ba2a83c0471b5
Author: Tejun Heo <tj@xxxxxxxxxx>
Date: Tue Apr 25 17:43:50 2017 -0700
sched/fair: Fix O(nr_cgroups) in load balance path
Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
live cfs_rqs which have ever been active on the CPU; unfortunately,
this makes update_blocked_averages() O(# total cgroups) which isn't
scalable at all.
This shows up as a small CPU consumption and scheduling latency
increase in the load balancing path in systems with CPU controller
enabled across most cgroups. In an edge case where temporary cgroups
were leaking, this caused the kernel to consume good several tens of
percents of CPU cycles running update_blocked_averages(), each run
taking multiple millisecs.
This patch fixes the issue by taking empty and fully decayed cfs_rqs
off the rq->leaf_cfs_rq_list.
[Test]
1). Running the script
#/bin/bash
for i in $(seq 1 10); do
( for j in $(seq 1 3000); do ssh -S none u@localhost date;done; echo "done $i" ) &
done
2). Observe the cfs_rqs
$ watch -n1 "grep cfs_rq /proc/sched_debug| wc -l"
3). Observe the CPU utilization rate
$ sudo htop
The patched kernel[2] shows that the CPU utilization rate is normal, the
cfs_rqs is decreased periodically, and the memory can be limited.
[Reference]
[1]. http://kernel.ubuntu.com/~gavinguo/168887/2018-01-31_07-38-45.perf.data.svg
[2]. https://launchpad.net/~mimi0213kimo/+archive/ubuntu/cfs-rq-clean
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1747896/+subscriptions