← Back to team overview

kernel-packages team mailing list archive

[Bug 1481357] Re: [LTCTest][Kernel][OP810]Ubuntu14.04.3: kernel call trace while continuously switching between smt on/off and changing subcores

 

** Changed in: linux (Ubuntu Vivid)
       Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1481357

Title:
  [LTCTest][Kernel][OP810]Ubuntu14.04.3: kernel call trace while
  continuously switching between smt on/off and changing subcores

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  == Comment: #0 - Satheesh Rajendran <satheera@xxxxxxxxxx> - 2015-07-14 02:32:22 ==
  While running the below script, and nothing else was running in the system, host got the kernel call traces after around 15~20 mins

  #!/bin/bash -x
  while true;
  do for i in 1 2 4;
  do ppc64_cpu  --subcores-per-core=$i;
  ppc64_cpu --info
  ppc64_cpu --smt=off;
  ppc64_cpu --info
  sleep 1;
  lscpu
  ppc64_cpu --smt=on;
  ppc64_cpu --info
  sleep 1;
  lscpu
  done;
  sleep 1;
  done

  
  Environment:
  ------------------
  $ cat /etc/issue
  Ubuntu 14.04.2 LTS \n \l

  $ uname -a
  Linux tul8fp 3.19.0-22-generic #22~14.04.1-Ubuntu SMP Wed Jun 17 10:03:39 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

  $ lscpu
  Architecture:          ppc64le
  Byte Order:            Little Endian
  CPU(s):                160
  On-line CPU(s) list:   0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116,120,124,128,132,136,138-159
  Off-line CPU(s) list:  1-3,5-7,9-11,13-15,17-19,21-23,25-27,29-31,33-35,37-39,41-43,45-47,49-51,53-55,57-59,61-63,65-67,69-71,73-75,77-79,81-83,85-87,89-91,93-95,97-99,101-103,105-107,109-111,113-115,117-119,121-123,125-127,129-131,133-135,137
  Thread(s) per core:    2
  Core(s) per socket:    10
  Socket(s):             2
  NUMA node(s):          2
  Model:                 8335-GTA
  L1d cache:             64K
  L1i cache:             32K
  L2 cache:              512K
  L3 cache:              8192K
  NUMA node0 CPU(s):     0,4,88,92,96,100,104,108,112,116,120,124,128,132,136,138-159
  NUMA node8 CPU(s):     8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84

  $ tail -10 /proc/cpuinfo 
  processor	: 159
  cpu		: POWER8 (raw), altivec supported
  clock		: 2693.000000MHz
  revision	: 2.0 (pvr 004d 0200)

  timebase	: 512000000
  platform	: PowerNV
  model		: 8335-GTA        
  machine		: PowerNV 8335-GTA        
  firmware	: OPAL v3

  Logs:-
  ---------

  
  # dmesg
  [79081.074088] INFO: task irqbalance:1897 blocked for more than 120 seconds.
  [79081.074333]       Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
  [79081.074388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [79081.074449] irqbalance      D 00003fff9ac39d3c     0  1897      1 0x00040000
  [79081.074454] Call Trace:
  [79081.074470] [c0000007e8d035f0] [00003fff9af50000] 0x3fff9af50000 (unreliable)
  [79081.074477] [c0000007e8d037c0] [c000000000015934] __switch_to+0x204/0x350
  [79081.074482] [c0000007e8d03820] [c000000000a11508] __schedule+0x368/0x8d0
  [79081.074486] --- interrupt: 1000ba30 at 0xc0000007e8d039f0
  [79081.074486]     LR = 0x10026a6c1d0
  [79081.074490] [c0000007e8d03a40] [c000000000a120a0] schedule_preempt_disabled+0x20/0x30 (unreliable)
  [79081.074494] [c0000007e8d03a60] [c000000000a1420c] __mutex_lock_slowpath+0xfc/0x1f0
  [79081.074497] [c0000007e8d03ae0] [c000000000a1436c] mutex_lock+0x6c/0x70
  [79081.074501] [c0000007e8d03b10] [c00000000064488c] online_show+0x3c/0x90
  [79081.074503] [c0000007e8d03b90] [c000000000644fec] dev_attr_show+0x5c/0xc0
  [79081.074508] [c0000007e8d03bd0] [c00000000035b7f4] sysfs_kf_seq_show+0x114/0x200
  [79081.074511] [c0000007e8d03c20] [c0000000003591e4] kernfs_seq_show+0x54/0x70
  [79081.074515] [c0000007e8d03c50] [c0000000002e74fc] seq_read+0xec/0x500
  [79081.074519] [c0000007e8d03cf0] [c00000000035a474] kernfs_fop_read+0x184/0x220
  [79081.074523] [c0000007e8d03d50] [c0000000002b520c] __vfs_read+0x6c/0x120
  [79081.074527] [c0000007e8d03d90] [c0000000002b5364] vfs_read+0xa4/0x1c0
  [79081.074530] [c0000007e8d03de0] [c0000000002b54ec] SyS_read+0x6c/0x110
  [79081.074534] [c0000007e8d03e30] [c000000000009258] system_call+0x38/0xd0
  [79081.074546] INFO: task ppc64_cpu:111047 blocked for more than 120 seconds.
  [79081.074597]       Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
  [79081.074802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [79081.074865] ppc64_cpu       D 00003fff82892d08     0 111047 109995 0x00040000
  [79081.074868] Call Trace:
  [79081.074871] [c000000f1c67f680] [c000000f1c67f710] 0xc000000f1c67f710 (unreliable)
  [79081.074875] [c000000f1c67f850] [c000000000015934] __switch_to+0x204/0x350
  [79081.074878] [c000000f1c67f8b0] [c000000000a11508] __schedule+0x368/0x8d0
  [79081.074882] [c000000f1c67fad0] [c0000000000aff10] cpu_hotplug_begin+0xa0/0xb0
  [79081.074887] [c000000f1c67fb10] [c00000000020b0a0] _cpu_down+0xb0/0x3a0
  [79081.074890] [c000000f1c67fbd0] [c00000000020b3e8] cpu_down+0x58/0xa0
  [79081.074894] [c000000f1c67fc00] [c00000000064fa0c] cpu_subsys_offline+0x2c/0x50
  [79081.074897] [c000000f1c67fc30] [c000000000647aa4] device_offline+0x104/0x140
  [79081.074900] [c000000f1c67fc70] [c000000000647c6c] online_store+0x6c/0xc0
  [79081.074904] [c000000f1c67fcc0] [c000000000643ab8] dev_attr_store+0x68/0xa0
  [79081.074907] [c000000f1c67fd00] [c00000000035b350] sysfs_kf_write+0x80/0xb0
  [79081.074910] [c000000f1c67fd40] [c00000000035a28c] kernfs_fop_write+0x18c/0x1f0
  [79081.074914] [c000000f1c67fd90] [c0000000002b474c] vfs_write+0xdc/0x260
  [79081.074917] [c000000f1c67fde0] [c0000000002b55fc] SyS_write+0x6c/0x110
  [79081.074921] [c000000f1c67fe30] [c000000000009258] system_call+0x38/0xd0
  [79201.074065] INFO: task irqbalance:1897 blocked for more than 120 seconds.
  [79201.074312]       Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
  [79201.074361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [79201.074421] irqbalance      D 00003fff9ac39d3c     0  1897      1 0x00040000
  [79201.074424] Call Trace:
  [79201.074428] [c0000007e8d035f0] [00003fff9af50000] 0x3fff9af50000 (unreliable)
  [79201.074432] [c0000007e8d037c0] [c000000000015934] __switch_to+0x204/0x350
  [79201.074436] [c0000007e8d03820] [c000000000a11508] __schedule+0x368/0x8d0
  [79201.074439] --- interrupt: 1000ba30 at 0xc0000007e8d039f0
  [79201.074439]     LR = 0x10026a6c1d0
  [79201.074443] [c0000007e8d03a40] [c000000000a120a0] schedule_preempt_disabled+0x20/0x30 (unreliable)
  [79201.074447] [c0000007e8d03a60] [c000000000a1420c] __mutex_lock_slowpath+0xfc/0x1f0
  [79201.074450] [c0000007e8d03ae0] [c000000000a1436c] mutex_lock+0x6c/0x70
  [79201.074453] [c0000007e8d03b10] [c00000000064488c] online_show+0x3c/0x90
  [79201.074456] [c0000007e8d03b90] [c000000000644fec] dev_attr_show+0x5c/0xc0
  [79201.074459] [c0000007e8d03bd0] [c00000000035b7f4] sysfs_kf_seq_show+0x114/0x200
  [79201.074462] [c0000007e8d03c20] [c0000000003591e4] kernfs_seq_show+0x54/0x70
  [79201.074465] [c0000007e8d03c50] [c0000000002e74fc] seq_read+0xec/0x500
  [79201.074468] [c0000007e8d03cf0] [c00000000035a474] kernfs_fop_read+0x184/0x220
  [79201.074472] [c0000007e8d03d50] [c0000000002b520c] __vfs_read+0x6c/0x120
  [79201.074475] [c0000007e8d03d90] [c0000000002b5364] vfs_read+0xa4/0x1c0
  [79201.074479] [c0000007e8d03de0] [c0000000002b54ec] SyS_read+0x6c/0x110
  [79201.074482] [c0000007e8d03e30] [c000000000009258] system_call+0x38/0xd0
  [79201.074499] INFO: task ppc64_cpu:111047 blocked for more than 120 seconds.
  [79201.074638]       Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
  [79201.074726] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [79201.074786] ppc64_cpu       D 00003fff82892d08     0 111047 109995 0x00040000
  [79201.074789] Call Trace:
  [79201.074792] [c000000f1c67f680] [c000000f1c67f710] 0xc000000f1c67f710 (unreliable)
  [79201.074795] [c000000f1c67f850] [c000000000015934] __switch_to+0x204/0x350
  [79201.074798] [c000000f1c67f8b0] [c000000000a11508] __schedule+0x368/0x8d0
  [79201.074802] [c000000f1c67fad0] [c0000000000aff10] cpu_hotplug_begin+0xa0/0xb0
  [79201.074805] [c000000f1c67fb10] [c00000000020b0a0] _cpu_down+0xb0/0x3a0
  [79201.074808] [c000000f1c67fbd0] [c00000000020b3e8] cpu_down+0x58/0xa0
  [79201.074811] [c000000f1c67fc00] [c00000000064fa0c] cpu_subsys_offline+0x2c/0x50
  [79201.074814] [c000000f1c67fc30] [c000000000647aa4] device_offline+0x104/0x140
  [79201.074817] [c000000f1c67fc70] [c000000000647c6c] online_store+0x6c/0xc0
  [79201.074821] [c000000f1c67fcc0] [c000000000643ab8] dev_attr_store+0x68/0xa0
  [79201.074824] [c000000f1c67fd00] [c00000000035b350] sysfs_kf_write+0x80/0xb0
  [79201.074827] [c000000f1c67fd40] [c00000000035a28c] kernfs_fop_write+0x18c/0x1f0
  [79201.074831] [c000000f1c67fd90] [c0000000002b474c] vfs_write+0xdc/0x260
  [79201.074834] [c000000f1c67fde0] [c0000000002b55fc] SyS_write+0x6c/0x110
  [79201.074837] [c000000f1c67fe30] [c000000000009258] system_call+0x38/0xd0
  [79321.074029] INFO: task irqbalance:1897 blocked for more than 120 seconds.
  [79321.074269]       Not tainted 3.19.0-22-generic #22~14.04.1-Ubuntu
  [79321.074319] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [79321.074378] irqbalance      D 00003fff9ac39d3c     0  1897      1 0x00040000
  [79321.074382] Call Trace:
  [79321.074386] [c0000007e8d035f0] [00003fff9af50000] 0x3fff9af50000 (unreliable)
  [79321.074390] [c0000007e8d037c0] [c000000000015934] __switch_to+0x204/0x350
  [79321.074393] [c0000007e8d03820] [c000000000a11508] __schedule+0x368/0x8d0
  [79321.074396] --- interrupt: 1000ba30 at 0xc0000007e8d039f0
  [79321.074396]     LR = 0x10026a6c1d0
  [79321.074400] [c0000007e8d03a40] [c000000000a120a0] schedule_preempt_disabled+0x20/0x30 (unreliable)
  [79321.074404] [c0000007e8d03a60] [c000000000a1420c] __mutex_lock_slowpath+0xfc/0x1f0
  [79321.074407] [c0000007e8d03ae0] [c000000000a1436c] mutex_lock+0x6c/0x70
  [79321.074410] [c0000007e8d03b10] [c00000000064488c] online_show+0x3c/0x90
  [79321.074412] [c0000007e8d03b90] [c000000000644fec] dev_attr_show+0x5c/0xc0
  [79321.074416] [c0000007e8d03bd0] [c00000000035b7f4] sysfs_kf_seq_show+0x114/0x200
  [79321.074419] [c0000007e8d03c20] [c0000000003591e4] kernfs_seq_show+0x54/0x70
  [79321.074422] [c0000007e8d03c50] [c0000000002e74fc] seq_read+0xec/0x500
  [79321.074425] [c0000007e8d03cf0] [c00000000035a474] kernfs_fop_read+0x184/0x220
  [79321.074429] [c0000007e8d03d50] [c0000000002b520c] __vfs_read+0x6c/0x120
  [79321.074432] [c0000007e8d03d90] [c0000000002b5364] vfs_read+0xa4/0x1c0
  [79321.074436] [c0000007e8d03de0] [c0000000002b54ec] SyS_read+0x6c/0x110
  [79321.074439] [c0000007e8d03e30] [c000000000009258] system_call+0x38/0xd0
  [79321.074453] INFO: task ppc64_cpu:111047 blocked for more than 120 seconds.

  == Comment: #1 - Satheesh Rajendran <satheera@xxxxxxxxxx> - 2015-07-14 02:34:55 ==
  Tried to get sosreport but the same got stuck during info gathering at processor.

  # sosreport 
  ...
  Please enter your first initial and last name [tul8fp]: satheesh
  Please enter the case number that you are generating this report for: trace

   Running plugins. Please wait ...

    Running 51/64: processor...

  == Comment: #3 - Ranjal G. Shenoy <ranjshen@xxxxxxxxxx> - 2015-07-14 05:18:17 ==
  Hi Sandhya,

  From the call trace it looks like ppc64_task which is trying to
  offline a cpu is blocked inside cpu_hotplug_begin(). This happens when
  someone else is trying to prevent a cpu-hotplug using the
  get_online_cpus()/put_online_cpus() call. It is not clear from the
  logs what is that entity that's preventing this.

  That said,  I don't think the patch that you are referring to has
  anything to do with it. The problem lies elsewhere. If you see that
  the ppc64_cpu task is not making any progress at all, then it might be
  prudent to enable lockdep which will at least list out all the locks
  that are held at any point.

  == Comment: #4 - Ranjal G. Shenoy <ranjshen@xxxxxxxxxx> - 2015-07-14 05:48:39 ==
  (In reply to comment #3)
  > Hi Sandhya,
  > 
  > From the call trace it looks like ppc64_task which is trying to offline a
  > cpu is blocked inside cpu_hotplug_begin(). This happens when someone else is
  > trying to prevent a cpu-hotplug using the
  > get_online_cpus()/put_online_cpus() call. It is not clear from the logs what
  > is that entity that's preventing this. 
  > 
  > That said,  I don't think the patch that you are referring to has anything
  > to do with it. The problem lies elsewhere. If you see that the ppc64_cpu
  > task is not making any progress at all, then it might be prudent to enable
  > lockdep which will at least list out all the locks that are held at any
  > point.

  To add to that, I recall seeing a similar issue where cpu-offline
  operation was blocking inside cpu_hotplug_begin. The fix for that is
  present in the mainline with commit id : 87af9e7ff9. You can try
  backporting that one.

  == Comment: #13 - Satheesh Rajendran <satheera@xxxxxxxxxx> - 2015-07-27 12:06:19 ==
  Ran script mentioned in bug description continuously for 6 hours and not have seen the mentioned call traces.

  Once this fix is available as part of ubuntu14.04.3 released kernel,
  this bug can be closed.

  Thanks in advance.

  Regards,
  -Satheesh.

  == Comment: #14 - Brahadambal Srinivasan <brsriniv@xxxxxxxxxx> - 2015-08-04 09:31:21 ==
  Cherry picked bug backported to the trusty kernel to resolve the issue reported in this bug

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1481357/+subscriptions