← Back to team overview

kernel-packages team mailing list archive

[Bug 1532914] Re: Surelock GA2 SP1: capiredp01: cxl_init_adapter fails for CAPI devices 0000:01:00.0 and 0005:01:00.0 after upgrading to 840.10 Platform firmware build fips840/b1208b_1604.840

 

This bug was fixed in the package linux - 4.4.0-9.24

---------------
linux (4.4.0-9.24) xenial; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1551319

  * AppArmor logs denial for when the device path is ENOENT (LP: #1482943)
    - SAUCE: apparmor: fix log of apparmor audit message when kern_path() fails

  * BUG: unable to handle kernel NULL pointer dereference (aa_label_merge) (LP:
    #1448912)
    - SAUCE: apparmor: Fix: insert race between label_update and label_merge
    - SAUCE: apparmor: Fix: ensure aa_get_newest will trip debugging if the
      replacedby is not setup
    - SAUCE: apparmor: Fix: label merge handling of marking unconfined and stale
    - SAUCE: apparmor: Fix: refcount race between locating in labelset and get
    - SAUCE: apparmor: Fix: ensure new labels resulting from merge have a
      replacedby
    - SAUCE: apparmor: Fix: label_vec_merge insertion
    - SAUCE: apparmor: Fix: deadlock in aa_put_label() call chain
    - SAUCE: apparmor: Fix: add required locking of __aa_update_replacedby on
      merge path
    - SAUCE: apparmor: Fix: convert replacedby update to be protected by the
      labelset lock
    - SAUCE: apparmor: Fix: update replacedby allocation to take a gfp parameter

  * apparmor kernel BUG kills firefox (LP: #1430546)
    - SAUCE: apparmor: Disallow update of cred when then subjective != the
      objective cred
    - SAUCE: apparmor: rework retrieval of the current label in the profile update
      case

  * sleep from invalid context in aa_move_mount (LP: #1539349)
    - SAUCE: apparmor: fix sleep from invalid context

  * s390x: correct restore of high gprs on signal return (LP: #1550468)
    - s390/compat: correct restore of high gprs on signal return

  * missing SMAP support (LP: #1550517)
    - x86/entry/compat: Add missing CLAC to entry_INT80_32

  * Floating-point exception handler receives empty Data-Exception Code in
    Floating Point Control register (LP: #1548414)
    - s390/fpu: signals vs. floating point control register

  * kvm fails to boot GNU Hurd kernels with 4.4 Xenial kernel (LP: #1550596)
    - KVM: x86: fix conversion of addresses to linear in 32-bit protected mode

  * Surelock GA2 SP1: capiredp01: cxl_init_adapter fails for CAPI devices
    0000:01:00.0 and 0005:01:00.0 after upgrading to 840.10 Platform firmware
    build fips840/b1208b_1604.840 (LP: #1532914)
    - cxl: Fix PSL timebase synchronization detection

  * [Feature]EDAC support for Knights Landing (LP: #1519631)
    - EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing

  * Various failures of kernel_security suite on Xenial kernel on s390x arch
    (LP: #1531327)
    - [config] s390x -- CONFIG_DEFAULT_MMAP_MIN_ADDR=65536

  * Unable to install VirtualBox Guest Service in 15.04 (LP: #1434579)
    - [Config] Provides: virtualbox-guest-modules when appropriate

  * linux is missing provides for virtualbox-guest-modules [i386 amd64 x32] (LP:
    #1507588)
    - [Config] Provides: virtualbox-guest-modules when appropriate

  * Backport more recent driver for SKL, KBL and BXT graphics (LP: #1540390)
    - SAUCE: i915_bpo: Provide a backport driver for SKL, KBL & BXT graphics
    - SAUCE: i915_bpo: Update intel_ips.h file location
    - SAUCE: i915_bpo: Rename the backport driver to i915_bpo
    - SAUCE: i915_bpo: Add i915_bpo_*() calls for ubuntu/i915
    - drm/i915: remove an extra level of indirection in PCI ID list
    - drm/i915/kbl: Add Kabylake PCI ID
    - drm/i915/kbl: Add Kabylake GT4 PCI ID
    - mm: Export nr_swap_pages
    - async: export current_is_async()
    - drm: fix potential dangling else problems in for_each_ macros
    - dp/mst: add SDP stream support
    - drm: Implement drm_modeset_lock_all_ctx()
    - drm: Add "prefix" parameter to drm_rect_debug_print()
    - drm/i915: Set connector_state->connector using the helper.
    - drm/atomic: add connector mask to drm_crtc_state.
    - drm/i915: Report context GTT size
    - drm/i915: Add get_eld audio component
    - SAUCE: Backport I915_PARAM_HAS_EXEC_SOFTPIN and EXEC_OBJECT_PINNED
    - SAUCE: i915_bpo: Revert passing plane/encoder name
    - SAUCE: sound/hda: Load i915_bpo from the hda driver on SKL/KBL/BXT
    - SAUCE: i915_bpo: Support only SKL, KBL and BXT with the backport driver
    - drm/i915/bxt: update list of PCIIDs
    - drm/i915/skl: Add missing SKL ids
    - SAUCE: i915_bpo: Revert "drm/i915: Defer probe if gmux is present but its
      driver isn't"
    - SAUCE: uapi/drm/i915: Backport I915_EXEC_BSD_MASK
    - drm/atomic: Do not unset crtc when an encoder is stolen
    - drm/i915: Update connector_mask during readout, v2.
    - drm/atomic: Add encoder_mask to crtc_state, v3.
    - SAUCE: drm/core: Add drm_encoder_index.
    - SAUCE: i915_bpo: Revert "drm/i915: Switch DDC when reading the EDID"
    - i915_bpo: [Config] Enable CONFIG_DRM_I915_BPO=m

  * arm64: guest hangs when ntpd is running (LP: #1549494)
    - hrtimer: Add support for CLOCK_MONOTONIC_RAW
    - hrtimer: Catch illegal clockids
    - KVM: arm/arm64: timer: Switch to CLOCK_MONOTONIC_RAW

  * Miscellaneous Ubuntu changes
    - [Debian] git-ubuntu-log -- wrap long bug and commit titles
    - [Config] CONFIG_ARM_SMMU=y on arm64
    - rebase to v4.4.3
    - [Debian] git-ubuntu-log -- ensure we get the last commit
    - [Config] fix up spelling of probably again
    - [Debian] perf -- build in the context of the full generated local headers
    - SAUCE: tools: lib/bpf -- add generated headers to search path
    - SAUCE: proc: Always set super block owner to init_user_ns
    - SAUCE: fix-up: kern_mount fail path should not be doing put_buffers()
    - SAUCE: apparmor: Fix: oops do to invalid null ptr deref in label print fns
    - SAUCE: apparmor: debug: POISON label and replaceby pointer on free
    - SAUCE: apparmor: add underscores to indicate aa_label_next_not_in_set() use
      needs locking
    - SAUCE: apparmor: Fix: refcount leak in aa_label_merge
    - SAUCE: apparmor: ensure that repacedby sharing is done correctly
    - SAUCE: apparmor Fix: refcount bug in pivotroot mediation
    - SAUCE: apparmor: Fix: now that insert can force replacement use it instead
      of remove_and_insert
    - SAUCE: apparmor: Fix: refcount bug when inserting label update that
      transitions ns
    - SAUCE: apparmor: Fix: break circular refcount for label that is directly
      freed.
    - SAUCE: apparmor: Don't remove label on rcu callback if the label has already
      been removed
    - SAUCE: apparmor: Fix: query label file permission
    - SAUCE: apparmor: fix: ref count leak when profile sha1 hash is read
    - SAUCE: fixup: cleanup return handling of labels
    - SAUCE: fix: replacedby forwarding is not being properly update when ns is
      destroyed
    - SAUCE: fixup: make __share_replacedby private to get rid of build warning
    - SAUCE: fixup: 20/23 locking issue around in __label_update
    - SAUCE: fixup: get rid of unused var build warning
    - SAUCE: fixup: cast poison values to remove warnings
    - SAUCE: apparmor: fix refcount race when finding a child profile
    - SAUCE: fixup: warning about aa_label_vec_find_or_create not being static
    - SAUCE: fix: audit "no_new_privs" case for exec failure
    - SAUCE: Fixup: __label_update() still doesn't handle some cases correctly.
    - SAUCE: Move replacedby allocation into label_alloc
    - [Debian] supply zfs dkms Provides: based on do_zfs
    - [Config] supply zfs dkms Provides: based on do_zfs
    - [Config] drop linux-image-3.0 provides

  * Miscellaneous upstream changes
    - x86/mpx: Fix off-by-one comparison with nr_registers

  [ Upstream Kernel Changes ]

  * rebase to v4.4.3

 -- Tim Gardner <tim.gardner@xxxxxxxxxxxxx>  Thu, 25 Feb 2016 19:47:55
-0700

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1532914

Title:
  Surelock GA2 SP1: capiredp01: cxl_init_adapter fails for CAPI devices
  0000:01:00.0 and 0005:01:00.0 after upgrading to 840.10 Platform
  firmware build fips840/b1208b_1604.840

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Wily:
  In Progress
Status in linux source package in Xenial:
  Fix Released

Bug description:
  Problem Description
  ++++++++++++++++++++
  I upgraded the Platform firmware to the 840.10 Platform firmware build (b1208b_1604.840) to prepare for Surelock GA2 SP1 testing.  After the upgrade, I used the ipmitool to power on capiredfsp.aus.stglabs.ibm.com and boot the Ubuntu 15.10 partition (capiredp01.aus.stglabs.ibm.com) in OPAL firmware mode.  In petitboot, I saw messages for "cxl-pci 0000:01:00.0: cxl_init_adapter failed: -5" and "cxl-pci 0005:01:00.0: cxl_init_adapter failed: -5."  After the partition started running, I didn't see any AFU devices in /dev/cxl/ or /sys/class/cxl/ although I was able to see PCI devices for the hardware accelerators (0000:01:00.0 and 0005:01:00.0) with the lspci command.

  ubuntu@capiredp01:~$ ls -l /dev/cxl/
  ls: cannot access /dev/cxl/: No such file or directory
  ubuntu@capiredp01:~$ ls -l /sys/class/cxl/
  total 0
  ubuntu@capiredp01:~$ sudo lscfg | grep -i afu
  ubuntu@capiredp01:~$ sudo lspci|egrep -i "04cf|0477"
  0000:01:00.0 Processing accelerators: IBM Device 04cf (rev 01)
  0005:01:00.0 Processing accelerators: IBM Device 04cf (rev 01)
  ubuntu@capiredp01:~$ lsscsi -g
  [0:0:0:0]    enclosu IBM      VSBPD12M1 6GSAS    03  -          /dev/sg1 
  [0:0:1:0]    cd/dvd  IBM.     RMBO0140512      RA65  /dev/sr0   /dev/sg2 
  [0:3:0:0]    no dev  IBM      57D7001SISIOA    0150  -          /dev/sg0 
  [1:0:0:0]    enclosu IBM      VSBPD12M1 6GSAS    03  -          /dev/sg4 
  [1:0:1:0]    disk    IBM      HUC109030CSS600  E5C6  /dev/sda   /dev/sg5 
  [1:0:2:0]    disk    IBM      HUC101212CSS600  A5AA  /dev/sdb   /dev/sg6 
  [1:0:3:0]    disk    IBM      HUC101212CSS600  A5AA  /dev/sdc   /dev/sg7 
  [1:0:4:0]    disk    IBM      HUC101212CSS600  A5AA  /dev/sdd   /dev/sg8 
  [1:0:5:0]    disk    IBM      ST1200MM0007     BF04  /dev/sde   /dev/sg9 
  [1:0:6:0]    disk    IBM      ST1200MM0007     BF04  /dev/sdf   /dev/sg10
  [1:3:0:0]    no dev  IBM      57D7001SISIOA    0150  -          /dev/sg3 

  
  This is a regression: the Linux kernel has failed to synchronize the PSL timebase.
  The corresponding error message is in the dmesg log attached in comment #4:

  [    1.687586] PSL: Timebase sync: giving up!

  CAPI devices are not enabled, because of this failure.

  PSL Timebase sync should not be a requirement for CAPI initialization,
  nor should it make an initialized card become unavailable.  Currently,
  timebase is an unused function of CAPI with hopes of adoption in the
  future.  Support of this feature should be considered optional at this
  time.

  I'm not sure what the fastest way to fix this is, but it needs to be
  fixed as quickly as possible.  CAPI is broken in Ubuntu 15.10.

  I can reproduce the bug, regardless of the skiboot level, with recent kernels.
  Older kernels behave as expected, regardless of the skiboot level.

  Firmware is not the cause of the regression, and kernel probably is.
  I sent this out to the capi-linux distro too, but I'll comment here as well.  I'm not sure what is being looked at to determine the PSL timebase sync failed.  As far as I know all PSL versions should support timebase.  The only timebase error the PSL logs is if CAPP returns a status that says timebase has an error.  I'd think if that is the issue that timebase has not been correctly enabled or sequenced correctly in the host CAPP.  The PSL can't be enabled for timebase until the CAPP unit in the host has been enabled.

  I have installed a recent mainline Linux kernel (4.4.0-rc8) on
  capiredp01. I have rebooted this kernel and verified that the PSL
  timebase syncs without problem.

  I will now compare the source code of Ubuntu kernel 4.2.0-19 (that
  hits the bug) with the source of mainline kernel 4.4.0-rc8 (that
  operates as expected).

  I have updated the Ubuntu kernel and modules with:

  $ sudo apt-get install linux-image-4.2.0-23-generic
  $ sudo apt-get install linux-image-extra-4.2.0-23-generic

  I have rebooted Ubuntu kernel linux-image-4.2.0-23-generic, and found that the cxl driver hits the bug.
  I have also downloaded the source for this Ubuntu kernel (and modules) with:

  $ sudo apt-get source linux-image-4.2.0-23-generic

  I have recompiled and installed, and noticed that the resulting kernel
  bears the version 4.2.6 (??). I have rebooted this Ubuntu kernel 4.2.6
  built from the Ubuntu source for 4.2.0-23-generic, and found that the
  timebase sync occurs normally.

  In short, the kernels linux-4.2.6 and linux-4.4.0-rc8 (that I have
  built from the source, respectively provided by Ubuntu and Linus)
  operate normally, when all kernels compiled by, and downloaded from,
  Ubuntu hit the timebase sync bug.

  I will try to investigate possible differences between kernel config
  files or toolchain and build procedures.

  I have found that the bug can be activated or prevented via the Linux kernel config file.
  I have compiled the Ubuntu kernel source downloaded with

  $ sudo apt-get source linux-image-4.2.0-23-generic

  1. with my own config file => PSL timebase sync works fine
  2. with the config fille supplied by Ubuntu => PSL timebase sync fails

  I will now diff the config files, and try to identify the set of
  config parameters that change the kernel behavior regarding timebase
  sync.

  Got it. Here is the difference between config-4.2.0-23-generic (that
  hits the bug) and .config (that operates normally):

  $ diff config-4.2.0-23-generic .config
  130,131c130,132
  < CONFIG_TICK_CPU_ACCOUNTING=y
  < # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
  ---
  > CONFIG_VIRT_CPU_ACCOUNTING=y
  > # CONFIG_TICK_CPU_ACCOUNTING is not set
  > CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y

  For some reason, setting CONFIG_TICK_CPU_ACCOUNTING breaks PSL
  Timebase sync on ppc64le.  Investigating further.

  Canonical, can you please replace

  CONFIG_TICK_CPU_ACCOUNTING=y
  # CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set

  by
   
  CONFIG_VIRT_CPU_ACCOUNTING=y
  # CONFIG_TICK_CPU_ACCOUNTING is not set
  CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y

  in the default ppc64le Linux kernel configuration file?

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532914/+subscriptions