kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #174700
[Bug 1575407] Re: Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter
I have traced all execution paths plausible to increment "perf_count_sw_cpu_migrations"
(or (task struct *)p->se.nr_migrations++). Which would show us that the task migrated from
different CPUs.
____________________________________________________________
(Execution Path 1 - Kernel changes 2 tasks from 2 different CPUs)
[execution interrupted by page fault interruption]
handle_pte_fault ->
numa_migrate_preferred ->
task_numa_migrate ->
migrate_swap ->
stop_two_cpus ... (scheduling task migration state machine)
<asynchronous handler on cpu IPIs>
[kernel migration thread handles the task switch]
[you have 1 migration thread per cpu running]
[new cpu comes from numa balance/smp logic]
migrate_swap_stop ->
migrate_swap_task -> (kernel swaps 2 tasks from different cpus)
set_task_cpu ->
**** update to perf counter ****
[process scheduled in a new cpu]
____________________________________________________________
(Execution Path 2 - Similar to 1 but, instead of swapping, it sends the
task)
[execution interrupted by page fault interruption]
handle_pte_fault ->
migrate_task_to ->
stop_one_cpu ...(scheduling task submission to another cpu)
<asynchronous handler on cpu IPIs>
[kernel migration thread handles the task switch]
[you have 1 migration thread per cpu running]
[new cpu comes from numa balance/smp logic]
migration_cpu_stop ->
migrate_task -> (move task from one cpu to another)
move_queued_task ->
set_task_cpu ->
**** update to perf counter ****
[process scheduled in a new cpu]
____________________________________________________________
(Execution Path 3 - New executions)
[fork / exec]
sched_exec ->
stop_one_cpu... (scheduling task submission to another cpu)
<asynchronous handler on cpu IPIs>
[kernel migration thread handles the task switch]
[you have 1 migration thread per cpu running]
[new cpu comes from scheduler_class - fair/deadline/rt - select_task_rq logic]
[new cpu can also come from select_fallback_rq] --> fallback might not take cpumask in consideration
migration_cpu_stop ->
migrate_task -> (move task from one cpu to another)
move_queued_task ->
set_task_cpu ->
**** update to perf counter ****
[process scheduled in a new cpu]
&&&&&&
(Execution Path 4 - Regular Scheduling)
[wake up process]
[wake up state]
try_to_wake_up ->
[new cpu comes from scheduler_class - fair/deadline/rt - select_task_rq logic]
[new cpu can also come from select_fallback_rq] --> fallback might not take cpumask in consideration
select_task_rq ->
set_task_cpu
**** update to perf counter ****
[process scheduled in a new cpu]
****** note for execution paths 3 & 4 ********
-> select_fallback_rq is responsible for the messages:
[255688.556945] process 1 (init) no longer affine to cpu1
[266710.938490] process 1 (init) no longer affine to cpu1
[275071.280189] process 1 (init) no longer affine to cpu1
[286088.372647] process 1 (init) no longer affine to cpu1
[355886.470777] process 1 (init) no longer affine to cpu1
[358415.046246] process 1 (init) no longer affine to cpu1
from the dmesg.
It shows us that the fallback mechanism of picking the cpu run queue was used.
Fallback mechanism might be doing something wrong.
______________________________
PS: There are a few others paths coming from deadline & realtime
schedulers not shown here.
My idea is to get user & kernel stack traces on a probe to "set_task_cpu". This will tell us if it is being
called, by which function and if all calls are coming from the same execution path (like coming from
select_fallback_rq instead of (p->sched_class->select_task_rq() functions from fair scheduler).
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1575407
Title:
Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter
Status in linux package in Ubuntu:
Confirmed
Bug description:
It was brought to my attention that:
In a PowerPC based server, PERF seems to acuse cpu-migrations when
only a single cpu is activated.
## perf
Performance counter stats for 'CPU(s) 0':
15027.888988 task-clock (msec) # 1.000 CPUs utilized [100.00%]
25,206 context-switches # 0.002 M/sec [100.00%]
3,518 cpu-migrations # 0.234 K/sec [100.00%]
639 page-faults # 0.043 K/sec
41,545,780,384 cycles # 2.765 GHz [66.68%]
2,868,753,319 stalled-cycles-frontend # 6.91% frontend cycles idle [50.01%]
30,162,193,535 stalled-cycles-backend # 72.60% backend cycles idle [50.01%]
11,161,722,533 instructions # 0.27 insns per cycle
# 2.70 stalled cycles per insn [66.68%]
1,544,072,679 branches # 102.747 M/sec [49.99%]
52,536,867 branch-misses # 3.40% of all branches [49.99%]
15.027768835 seconds time elapsed
## lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0
Off-line CPU(s) list: 1-127
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 2
Model: 8335-GCA
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0
NUMA node8 CPU(s):
So either task migrations are being done to offline cpus or perf is
accounting it wrong.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1575407/+subscriptions
References