kernel-packages team mailing list archive

Thread
Date
[Bug 1575407] Re: Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Rafael David Tinoco <rafael.tinoco@xxxxxxxxxxxxx>
Date: Tue, 26 Apr 2016 23:03:32 -0000
Reply-to: Bug 1575407 <1575407@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
I have traced all execution paths plausible to increment "perf_count_sw_cpu_migrations" 
(or (task struct *)p->se.nr_migrations++). Which would show us that the task migrated from 
different CPUs. 
____________________________________________________________ 

(Execution Path 1 - Kernel changes 2 tasks from 2 different CPUs)

[execution interrupted by page fault interruption] 
handle_pte_fault -> 
numa_migrate_preferred -> 
task_numa_migrate -> 
migrate_swap -> 
stop_two_cpus ... (scheduling task migration state machine) 

<asynchronous handler on cpu IPIs> 
[kernel migration thread handles the task switch] 
[you have 1 migration thread per cpu running] 
[new cpu comes from numa balance/smp logic] 

migrate_swap_stop -> 
migrate_swap_task -> (kernel swaps 2 tasks from different cpus) 
set_task_cpu -> 
**** update to perf counter **** 
[process scheduled in a new cpu] 

____________________________________________________________

(Execution Path 2 - Similar to 1 but, instead of swapping, it sends the
task)

[execution interrupted by page fault interruption] 
handle_pte_fault -> 
migrate_task_to -> 
stop_one_cpu ...(scheduling task submission to another cpu) 

<asynchronous handler on cpu IPIs> 
[kernel migration thread handles the task switch] 
[you have 1 migration thread per cpu running] 
[new cpu comes from numa balance/smp logic] 

migration_cpu_stop -> 
migrate_task -> (move task from one cpu to another) 
move_queued_task -> 
set_task_cpu -> 
**** update to perf counter **** 
[process scheduled in a new cpu] 
____________________________________________________________ 

(Execution Path 3 - New executions)

[fork / exec] 
sched_exec -> 
stop_one_cpu... (scheduling task submission to another cpu) 

<asynchronous handler on cpu IPIs> 
[kernel migration thread handles the task switch] 
[you have 1 migration thread per cpu running] 
[new cpu comes from scheduler_class - fair/deadline/rt - select_task_rq logic] 
[new cpu can also come from select_fallback_rq] --> fallback might not take cpumask in consideration 

migration_cpu_stop -> 
migrate_task -> (move task from one cpu to another) 
move_queued_task -> 
set_task_cpu -> 
**** update to perf counter **** 
[process scheduled in a new cpu] 

&&&&&&

(Execution Path 4 - Regular Scheduling)

[wake up process] 
[wake up state] 
try_to_wake_up -> 
[new cpu comes from scheduler_class - fair/deadline/rt - select_task_rq logic] 
[new cpu can also come from select_fallback_rq] --> fallback might not take cpumask in consideration 
select_task_rq -> 
set_task_cpu 
**** update to perf counter **** 
[process scheduled in a new cpu] 

****** note for execution paths 3 & 4 ******** 
-> select_fallback_rq is responsible for the messages: 

[255688.556945] process 1 (init) no longer affine to cpu1 
[266710.938490] process 1 (init) no longer affine to cpu1 
[275071.280189] process 1 (init) no longer affine to cpu1 
[286088.372647] process 1 (init) no longer affine to cpu1 
[355886.470777] process 1 (init) no longer affine to cpu1 
[358415.046246] process 1 (init) no longer affine to cpu1 

from the dmesg. 
It shows us that the fallback mechanism of picking the cpu run queue was used. 
Fallback mechanism might be doing something wrong. 

______________________________

PS: There are a few others paths coming from deadline & realtime
schedulers not shown here.

My idea is to get user & kernel stack traces on a probe to "set_task_cpu". This will tell us if it is being 
called, by which function and if all calls are coming from the same execution path (like coming from 
select_fallback_rq instead of (p->sched_class->select_task_rq() functions from fair scheduler).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1575407

Title:
  Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  It was brought to my attention that:

  In a PowerPC based server, PERF seems to acuse cpu-migrations when
  only a single cpu is activated.

  ## perf

  Performance counter stats for 'CPU(s) 0':

  15027.888988      task-clock (msec)                 #    1.000 CPUs utilized [100.00%]
  25,206                     context-switches                 #    0.002 M/sec [100.00%]
  3,518                       cpu-migrations                     #    0.234 K/sec [100.00%]
  639                           page-faults                           #    0.043 K/sec                  
  41,545,780,384    cycles                                      #    2.765 GHz [66.68%]
  2,868,753,319       stalled-cycles-frontend    #    6.91% frontend cycles idle [50.01%]
  30,162,193,535    stalled-cycles-backend     #   72.60% backend  cycles idle [50.01%]
  11,161,722,533    instructions                          #    0.27  insns per cycle        
                                                                                     #    2.70  stalled cycles per insn [66.68%]
  1,544,072,679      branches                                #  102.747 M/sec [49.99%]
  52,536,867            branch-misses                     #    3.40% of all branches [49.99%]

  15.027768835 seconds time elapsed

  ## lscpu

  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                128
  On-line CPU(s) list:   0
  Off-line CPU(s) list:  1-127
  Thread(s) per core:    1
  Core(s) per socket:    1
  Socket(s):             1
  NUMA node(s):          2
  Model:                 8335-GCA
  L1d cache:             64K
  L1i cache:             32K
  L2 cache:              512K
  L3 cache:              8192K
  NUMA node0 CPU(s):     0
  NUMA node8 CPU(s):     

  So either task migrations are being done to offline cpus or perf is
  accounting it wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1575407/+subscriptions
References

[Bug 1575407] [NEW] Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter
From: Rafael David Tinoco, 2016-04-26