kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #162656
[Bug 1532914] Re: Surelock GA2 SP1: capiredp01: cxl_init_adapter fails for CAPI devices 0000:01:00.0 and 0005:01:00.0 after upgrading to 840.10 Platform firmware build fips840/b1208b_1604.840
** Tags removed: targetmilestone-inin---
** Tags added: targetmilestone-inin1510
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1532914
Title:
Surelock GA2 SP1: capiredp01: cxl_init_adapter fails for CAPI devices
0000:01:00.0 and 0005:01:00.0 after upgrading to 840.10 Platform
firmware build fips840/b1208b_1604.840
Status in linux package in Ubuntu:
Incomplete
Bug description:
Problem Description
++++++++++++++++++++
I upgraded the Platform firmware to the 840.10 Platform firmware build (b1208b_1604.840) to prepare for Surelock GA2 SP1 testing. After the upgrade, I used the ipmitool to power on capiredfsp.aus.stglabs.ibm.com and boot the Ubuntu 15.10 partition (capiredp01.aus.stglabs.ibm.com) in OPAL firmware mode. In petitboot, I saw messages for "cxl-pci 0000:01:00.0: cxl_init_adapter failed: -5" and "cxl-pci 0005:01:00.0: cxl_init_adapter failed: -5." After the partition started running, I didn't see any AFU devices in /dev/cxl/ or /sys/class/cxl/ although I was able to see PCI devices for the hardware accelerators (0000:01:00.0 and 0005:01:00.0) with the lspci command.
ubuntu@capiredp01:~$ ls -l /dev/cxl/
ls: cannot access /dev/cxl/: No such file or directory
ubuntu@capiredp01:~$ ls -l /sys/class/cxl/
total 0
ubuntu@capiredp01:~$ sudo lscfg | grep -i afu
ubuntu@capiredp01:~$ sudo lspci|egrep -i "04cf|0477"
0000:01:00.0 Processing accelerators: IBM Device 04cf (rev 01)
0005:01:00.0 Processing accelerators: IBM Device 04cf (rev 01)
ubuntu@capiredp01:~$ lsscsi -g
[0:0:0:0] enclosu IBM VSBPD12M1 6GSAS 03 - /dev/sg1
[0:0:1:0] cd/dvd IBM. RMBO0140512 RA65 /dev/sr0 /dev/sg2
[0:3:0:0] no dev IBM 57D7001SISIOA 0150 - /dev/sg0
[1:0:0:0] enclosu IBM VSBPD12M1 6GSAS 03 - /dev/sg4
[1:0:1:0] disk IBM HUC109030CSS600 E5C6 /dev/sda /dev/sg5
[1:0:2:0] disk IBM HUC101212CSS600 A5AA /dev/sdb /dev/sg6
[1:0:3:0] disk IBM HUC101212CSS600 A5AA /dev/sdc /dev/sg7
[1:0:4:0] disk IBM HUC101212CSS600 A5AA /dev/sdd /dev/sg8
[1:0:5:0] disk IBM ST1200MM0007 BF04 /dev/sde /dev/sg9
[1:0:6:0] disk IBM ST1200MM0007 BF04 /dev/sdf /dev/sg10
[1:3:0:0] no dev IBM 57D7001SISIOA 0150 - /dev/sg3
This is a regression: the Linux kernel has failed to synchronize the PSL timebase.
The corresponding error message is in the dmesg log attached in comment #4:
[ 1.687586] PSL: Timebase sync: giving up!
CAPI devices are not enabled, because of this failure.
PSL Timebase sync should not be a requirement for CAPI initialization,
nor should it make an initialized card become unavailable. Currently,
timebase is an unused function of CAPI with hopes of adoption in the
future. Support of this feature should be considered optional at this
time.
I'm not sure what the fastest way to fix this is, but it needs to be
fixed as quickly as possible. CAPI is broken in Ubuntu 15.10.
I can reproduce the bug, regardless of the skiboot level, with recent kernels.
Older kernels behave as expected, regardless of the skiboot level.
Firmware is not the cause of the regression, and kernel probably is.
I sent this out to the capi-linux distro too, but I'll comment here as well. I'm not sure what is being looked at to determine the PSL timebase sync failed. As far as I know all PSL versions should support timebase. The only timebase error the PSL logs is if CAPP returns a status that says timebase has an error. I'd think if that is the issue that timebase has not been correctly enabled or sequenced correctly in the host CAPP. The PSL can't be enabled for timebase until the CAPP unit in the host has been enabled.
I have installed a recent mainline Linux kernel (4.4.0-rc8) on
capiredp01. I have rebooted this kernel and verified that the PSL
timebase syncs without problem.
I will now compare the source code of Ubuntu kernel 4.2.0-19 (that
hits the bug) with the source of mainline kernel 4.4.0-rc8 (that
operates as expected).
I have updated the Ubuntu kernel and modules with:
$ sudo apt-get install linux-image-4.2.0-23-generic
$ sudo apt-get install linux-image-extra-4.2.0-23-generic
I have rebooted Ubuntu kernel linux-image-4.2.0-23-generic, and found that the cxl driver hits the bug.
I have also downloaded the source for this Ubuntu kernel (and modules) with:
$ sudo apt-get source linux-image-4.2.0-23-generic
I have recompiled and installed, and noticed that the resulting kernel
bears the version 4.2.6 (??). I have rebooted this Ubuntu kernel 4.2.6
built from the Ubuntu source for 4.2.0-23-generic, and found that the
timebase sync occurs normally.
In short, the kernels linux-4.2.6 and linux-4.4.0-rc8 (that I have
built from the source, respectively provided by Ubuntu and Linus)
operate normally, when all kernels compiled by, and downloaded from,
Ubuntu hit the timebase sync bug.
I will try to investigate possible differences between kernel config
files or toolchain and build procedures.
I have found that the bug can be activated or prevented via the Linux kernel config file.
I have compiled the Ubuntu kernel source downloaded with
$ sudo apt-get source linux-image-4.2.0-23-generic
1. with my own config file => PSL timebase sync works fine
2. with the config fille supplied by Ubuntu => PSL timebase sync fails
I will now diff the config files, and try to identify the set of
config parameters that change the kernel behavior regarding timebase
sync.
Got it. Here is the difference between config-4.2.0-23-generic (that
hits the bug) and .config (that operates normally):
$ diff config-4.2.0-23-generic .config
130,131c130,132
< CONFIG_TICK_CPU_ACCOUNTING=y
< # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
---
> CONFIG_VIRT_CPU_ACCOUNTING=y
> # CONFIG_TICK_CPU_ACCOUNTING is not set
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
For some reason, setting CONFIG_TICK_CPU_ACCOUNTING breaks PSL
Timebase sync on ppc64le. Investigating further.
Canonical, can you please replace
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
by
CONFIG_VIRT_CPU_ACCOUNTING=y
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
in the default ppc64le Linux kernel configuration file?
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532914/+subscriptions