← Back to team overview

kernel-packages team mailing list archive

[Bug 1401181] Re: Kernel Soft Lockup - CPU stuck for XXs!

 

I found the following messages on the same host:

"""
Nov 11 20:10:09  kernel: [7283531.696243] megasas: [ 0]waiting for 8 commands to complete
Nov 11 20:10:14  kernel: [7283536.711353] megasas: [ 5]waiting for 8 commands to complete
Nov 11 20:10:19  kernel: [7283541.726428] megasas: [10]waiting for 8 commands to complete
Nov 11 20:10:24  kernel: [7283546.741520] megasas: [15]waiting for 8 commands to complete
Nov 11 20:10:29  kernel: [7283551.756571] megasas: [20]waiting for 8 commands to complete
Nov 11 20:10:34  kernel: [7283556.771650] megasas: [25]waiting for 8 commands to complete
Nov 11 20:10:39  kernel: [7283561.786726] megasas: [30]waiting for 8 commands to complete
Nov 11 20:10:44  kernel: [7283566.801797] megasas: [35]waiting for 8 commands to complete
Nov 11 20:10:49  kernel: [7283571.816818] megasas: [40]waiting for 8 commands to complete
...
Nov 11 20:13:04  kernel: [7283707.223892] megasas: [175]waiting for 8 commands to complete
Nov 11 20:13:09  kernel: [7283712.238959] megaraid_sas: pending commands remain after waiting, will reset adapter.
Nov 11 20:13:09  kernel: [7283712.238961] megaraid_sas: resetting fusion adapter.
Nov 11 20:13:14  kernel: [7283717.317977] megasas: Waiting for FW to come to ready state
Nov 11 20:13:36  kernel: [7283739.312378] megasas: FW now in Ready state
Nov 11 20:13:37  kernel: [7283739.576116] megasas:IOC Init cmd success
Nov 11 20:13:37  kernel: [7283739.668032] megaraid_sas: Reset successful.
"""

And asked for information regarding LSI 9271-8i MegaRAID SAS HBA
controller:

"""
That can be achieved by using "megacli" command line tool (from LSI):

http://www.lsi.com/support/pages/download-search.aspx

With the following command:

# information about raid adapter

$ MegaCli64 -AdpAllInfo -aAll

# information about battery backup-up unit state

$ MegaCli64 -AdpBbuCmd -aAll

# information about virtual disks

$ MegaCli64 -LDInfo -Lall -aALL

# information about physical drives

$ MegaCli64 -PDList -aALL 
"""

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1401181

Title:
  Kernel Soft Lockup - CPU stuck for XXs!

Status in linux package in Ubuntu:
  Invalid

Bug description:
  It was brought to my attention the following situation:

  """
  At Nov 12 02:15:39 Juju node reports kernel soft lock up.

  Nov 12 02:15:39 l1-bootjujuvm-1a-de kernel: [6323788.024017] BUG: soft lockup - CPU#0 stuck for 22s! [khungtaskd:35]
  Nov 12 02:15:39 l1-bootjujuvm-1a-de kernel: [6323840.040003] BUG: soft lockup - CPU#1 stuck for 22s! [mongod:1575]

  ( jujuvm-var-log-2014-11-18-case-xxxxxxxxx.tar.bz2 )

  machine-0: 2014-11-12 02:15:39 ERROR juju.state.apiserver.common
  resource.go:102 error stopping *apiserver.pingTimeout resource: ping
  timeout ( all-machines-juju-00075278-2014-11-18.log.bz2 )

  juju failed and then restarted, causing openstack components to restart.
  """

  After digging a bit I found the following stack traces:

  """
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.770725] INFO: task jbd2/sda1-8:322 blocked for more than 120 seconds.
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.785348] Not tainted 3.13.0-32-generic #57-Ubuntu
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.801256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838388] jbd2/sda1-8 D ffff88103f2d4440 0 322 2 0x00000000
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838396] ffff881022a3bbc8 0000000000000002 ffff88102296dfc0 ffff881022a3bfd8
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838403] 0000000000014440 0000000000014440 ffff88102296dfc0 ffff88103f2d4cd8
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838408] ffff88107ffba550 0000000000000002 ffffffff811ee000 ffff881022a3bc40
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838423] Call Trace:
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838438] [<ffffffff811ee000>] ? generic_block_bmap+0x50/0x50
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838447] [<ffffffff817203fd>] io_schedule+0x9d/0x140
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838468] [<ffffffff811ee00e>] sleep_on_buffer+0xe/0x20
  ...

  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838518] INFO: task qemu-system-x86:26225 blocked for more than 120 seconds.
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.884250] Not tainted 3.13.0-32-generic #57-Ubuntu
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.910724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968688] qemu-system-x86 D ffff88103f514440 0 26225 1 0x00000000
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968692] ffff8806707cbd28 0000000000000002 ffff88100e63dfc0 ffff8806707cbfd8
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968694] 0000000000014440 0000000000014440 ffff88100e63dfc0 ffff88103f514cd8
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968696] ffff88107ffba0e8 0000000000000002 ffffffff8114e190 ffff8806707cbda0
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968699] Call Trace:
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968704] [<ffffffff8114e190>] ? wait_on_page_read+0x60/0x60
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968707] [<ffffffff817203fd>] io_schedule+0x9d/0x140
  Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968715] [<ffffffff8114e19e>] sleep_on_page+0xe/0x20 
  """

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1401181/+subscriptions


References