kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #94940
[Bug 1401181] Re: Kernel Soft Lockup - CPU stuck for XXs!
It turns out that the following controllers:
MegaRAID SAS 9265-8i
MegaRAID SAS 9266-4i
MegaRAID SAS 9266-8i
MegaRAID SAS 9285-8e
MegaRAID SAS 9285CV-8e
MegaRAID SAS 9270-8i
MegaRAID SAS 9271-4i
MegaRAID SAS 9271-8i
MegaRAID SAS 9271-8iCC
MegaRAID SAS 9286-8e
MegaRAID SAS 9286CV-8e
MegaRAID SAS 9286CV-8eCC
Need latest firmware to avoid a bug that causes adapter to be reset
after running several hours with ASPM (Active State Power Managerment)
->
http://www.lsi.com/downloads/Public/RAID%20Controllers/RAID%20Controllers%20Common%20Files/23.32.0-0009_SAS_FW_IMAGE_APP_3.440.05-3712.txt
This is probably what caused this adapter to reset (due to ASPM being
enabled).
So the fix here is probably to upgrade firmware:
FROM (contents from file provided by user):
BIOS Version : 5.46.02.0_4.16.08.00_0x06060900
WebBIOS Version : 6.1-71-e_71-Rel
Preboot CLI Version: 05.07-00:#%00011
FW Version : 3.400.05-3175
NVDATA Version : 2.1403.03-0128
Boot Block Version : 2.05.00.00-0010
BOOT Version : 07.26.26.219
To latest firmware (contents from firmware README file):
Version Numbers:
===============
Current Package Details:
Firmware Package: 23.32.0-0009 (MR 5.12)
WebBIOS 6.1-73-e_73-Rel
Firmware 3.440.05-3712
ROMENV 1.08
PCLI 05.07.00
BootBlock 2.05.00.00-0010
NVDATA 2.1409.0-0137
BootBlockCommon 07.26.26.219
UEFI_Driver 0x06090700 (SIGNED)
Hii v03.10.09.00 (SIGNED)
FCODE 4.16.08.00
BIOS 5.48.04.0
A workaround is to boot servers with boot param like this:
... "pcie_aspm=off disable_msi=1"
** Changed in: linux (Ubuntu)
Status: In Progress => Invalid
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1401181
Title:
Kernel Soft Lockup - CPU stuck for XXs!
Status in linux package in Ubuntu:
Invalid
Bug description:
It was brought to my attention the following situation:
"""
At Nov 12 02:15:39 Juju node reports kernel soft lock up.
Nov 12 02:15:39 l1-bootjujuvm-1a-de kernel: [6323788.024017] BUG: soft lockup - CPU#0 stuck for 22s! [khungtaskd:35]
Nov 12 02:15:39 l1-bootjujuvm-1a-de kernel: [6323840.040003] BUG: soft lockup - CPU#1 stuck for 22s! [mongod:1575]
( jujuvm-var-log-2014-11-18-case-xxxxxxxxx.tar.bz2 )
machine-0: 2014-11-12 02:15:39 ERROR juju.state.apiserver.common
resource.go:102 error stopping *apiserver.pingTimeout resource: ping
timeout ( all-machines-juju-00075278-2014-11-18.log.bz2 )
juju failed and then restarted, causing openstack components to restart.
"""
After digging a bit I found the following stack traces:
"""
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.770725] INFO: task jbd2/sda1-8:322 blocked for more than 120 seconds.
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.785348] Not tainted 3.13.0-32-generic #57-Ubuntu
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.801256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838388] jbd2/sda1-8 D ffff88103f2d4440 0 322 2 0x00000000
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838396] ffff881022a3bbc8 0000000000000002 ffff88102296dfc0 ffff881022a3bfd8
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838403] 0000000000014440 0000000000014440 ffff88102296dfc0 ffff88103f2d4cd8
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838408] ffff88107ffba550 0000000000000002 ffffffff811ee000 ffff881022a3bc40
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838423] Call Trace:
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838438] [<ffffffff811ee000>] ? generic_block_bmap+0x50/0x50
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838447] [<ffffffff817203fd>] io_schedule+0x9d/0x140
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838468] [<ffffffff811ee00e>] sleep_on_buffer+0xe/0x20
...
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.838518] INFO: task qemu-system-x86:26225 blocked for more than 120 seconds.
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.884250] Not tainted 3.13.0-32-generic #57-Ubuntu
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.910724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968688] qemu-system-x86 D ffff88103f514440 0 26225 1 0x00000000
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968692] ffff8806707cbd28 0000000000000002 ffff88100e63dfc0 ffff8806707cbfd8
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968694] 0000000000014440 0000000000014440 ffff88100e63dfc0 ffff88103f514cd8
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968696] ffff88107ffba0e8 0000000000000002 ffffffff8114e190 ffff8806707cbda0
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968699] Call Trace:
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968704] [<ffffffff8114e190>] ? wait_on_page_read+0x60/0x60
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968707] [<ffffffff817203fd>] io_schedule+0x9d/0x140
Nov 11 20:08:45 l1-jshost-1a-de kernel: [7283447.968715] [<ffffffff8114e19e>] sleep_on_page+0xe/0x20
"""
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1401181/+subscriptions
References