← Back to team overview

kernel-packages team mailing list archive

[Bug 1575407] Re: Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter

 

## VIVID

from kernel/sched/core.c

1064 #ifdef CONFIG_SMP 
1065 void set_task_cpu(struct task_struct *p, unsigned int new_cpu) 
1066 { 
... 
1093 if (task_cpu(p) != new_cpu) { 
1094 struct task_migration_notifier tmn; 
1095 
1096 if (p->sched_class->migrate_task_rq) 
1097 p->sched_class->migrate_task_rq(p, new_cpu); 
1098 p->se.nr_migrations++; 
1099 perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, NULL, 0); 
1100 
1101 tmn.task = p; 
1102 tmn.from_cpu = task_cpu(p); 
1103 tmn.to_cpu = new_cpu; 
1104 
1105 atomic_notifier_call_chain(&task_migration_notifier, 0, &tmn); 
1106 } 
1107 
1108 __set_task_cpu(p, new_cpu); 
1109 } 

## WILY

from include/linux/perf_event.h:

836 static inline void perf_event_task_sched_in(struct task_struct *prev, 
837 struct task_struct *task) 
838 { 
839 if (static_key_false(&perf_sched_events.key)) 
840 __perf_event_task_sched_in(prev, task); 
841 
842 if (perf_sw_migrate_enabled() && task->sched_migrated) { 
843 struct pt_regs *regs = this_cpu_ptr(&__perf_regs[0]); 
844 
845 perf_fetch_caller_regs(regs); 
846 ___perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, regs, 0); 
847 task->sched_migrated = 0; 
848 } 
849 } 

----

Checking how recent kernels incremented PERF_COUNT_SW_CPU_MIGRATIONS I
saw there was a difference from Vivid. While in Vivid,
PERF_COUNT_SW_CPU_MIGRATIONS was being incremented directly from
set_task_cpu (and that is why we asked for tracing of this function),
there was a commit that changed that behavior alleging software migrate
events were being accounted in a wrong way.

Instead of changing PERF SW counter right inside set_task_cpu(), it
would mark the task as "migrated" (using task_struct) and, later, when
context_switch() calls finish_task_switch(), if the task was marked as
"migrated", then the PERF SW counter will be incremented.

This change fixes 2 issues: 1) The migration didn't occur yet, since the
task wasn't scheduled (yet), just migrated. 2) Migrations that happen
from softirq context were accounted in the interrupted process (possible
as migrations that never happened).

Commit:

commit ff303e66c240ba6269e31817a386995440a18c99 
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> 
Date: Fri Apr 17 20:05:30 2015 +0200 

perf: Fix software migrate events

Stephane asked about PERF_COUNT_SW_CPU_MIGRATIONS and I realized it 
was borken: 

> The problem is that the task isn't actually scheduled while its being 
> migrated (obviously), and if its not scheduled, the counters aren't 
> scheduled either, so there's no observing of the fact. 
> 
> A further problem with migrations is that many migrations happen from 
> softirq context, which is nested inside the 'random' task context of 
> whoemever happens to run at that time, similarly for the wakeup 
> migrations triggered from (soft)irq context. All those end up being 
> accounted in the task that's currently running, eg. your 'ls'. 

The below cures this by marking a task as migrated and accounting it 
on the subsequent sched_in(). 

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>

It first appeared in v4.2-rc1.

For now, packages:

linux-image-4.2.0-36-generic 
linux-image-extra-4.2.0-36-generic 

Will probably mitigate the issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1575407

Title:
  Trusty + 3.19 (lts-vivid) PERF wrong cpu-migration counter

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  It was brought to my attention that:

  In a PowerPC based server, PERF seems to acuse cpu-migrations when
  only a single cpu is activated.

  ## perf

  Performance counter stats for 'CPU(s) 0':

  15027.888988      task-clock (msec)                 #    1.000 CPUs utilized [100.00%]
  25,206                     context-switches                 #    0.002 M/sec [100.00%]
  3,518                       cpu-migrations                     #    0.234 K/sec [100.00%]
  639                           page-faults                           #    0.043 K/sec                  
  41,545,780,384    cycles                                      #    2.765 GHz [66.68%]
  2,868,753,319       stalled-cycles-frontend    #    6.91% frontend cycles idle [50.01%]
  30,162,193,535    stalled-cycles-backend     #   72.60% backend  cycles idle [50.01%]
  11,161,722,533    instructions                          #    0.27  insns per cycle        
                                                                                     #    2.70  stalled cycles per insn [66.68%]
  1,544,072,679      branches                                #  102.747 M/sec [49.99%]
  52,536,867            branch-misses                     #    3.40% of all branches [49.99%]

  15.027768835 seconds time elapsed

  ## lscpu

  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                128
  On-line CPU(s) list:   0
  Off-line CPU(s) list:  1-127
  Thread(s) per core:    1
  Core(s) per socket:    1
  Socket(s):             1
  NUMA node(s):          2
  Model:                 8335-GCA
  L1d cache:             64K
  L1i cache:             32K
  L2 cache:              512K
  L3 cache:              8192K
  NUMA node0 CPU(s):     0
  NUMA node8 CPU(s):     

  So either task migrations are being done to offline cpus or perf is
  accounting it wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1575407/+subscriptions


References