← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1587316] Re: STC840.20:Alpine:alp7fp1:Ubuntu 16.04, BlueFin (SAN) EEH 6 times during boot then disabled SRC BA188002:b0314a_1612.840

 

This bug was fixed in the package linux - 4.4.0-31.50

---------------
linux (4.4.0-31.50) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1602449

  * nouveau: boot hangs at blank screen with unsupported graphics cards
    (LP: #1602340)
    - SAUCE: drm: check for supported chipset before booting fbdev off the hw

linux (4.4.0-30.49) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597897

  * FCP devices are not detected correctly nor deterministically (LP: #1567602)
    - scsi_dh_alua: Disable ALUA handling for non-disk devices
    - scsi_dh_alua: Use vpd_pg83 information
    - scsi_dh_alua: improved logging
    - scsi_dh_alua: sanitze sense code handling
    - scsi_dh_alua: use standard logging functions
    - scsi_dh_alua: return standard SCSI return codes in submit_rtpg
    - scsi_dh_alua: fixup description of stpg_endio()
    - scsi_dh_alua: use flag for RTPG extended header
    - scsi_dh_alua: use unaligned access macros
    - scsi_dh_alua: rework alua_check_tpgs() to return the tpgs mode
    - scsi_dh_alua: simplify sense code handling
    - scsi: Add scsi_vpd_lun_id()
    - scsi: Add scsi_vpd_tpg_id()
    - scsi_dh_alua: use scsi_vpd_tpg_id()
    - scsi_dh_alua: Remove stale variables
    - scsi_dh_alua: Pass buffer as function argument
    - scsi_dh_alua: separate out alua_stpg()
    - scsi_dh_alua: Make stpg synchronous
    - scsi_dh_alua: call alua_rtpg() if stpg fails
    - scsi_dh_alua: switch to scsi_execute_req_flags()
    - scsi_dh_alua: allocate RTPG buffer separately
    - scsi_dh_alua: Use separate alua_port_group structure
    - scsi_dh_alua: use unique device id
    - scsi_dh_alua: simplify alua_initialize()
    - revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach should
      succeed while TPG is transitioning")
    - scsi_dh_alua: move optimize_stpg evaluation
    - scsi_dh_alua: remove 'rel_port' from alua_dh_data structure
    - scsi_dh_alua: Use workqueue for RTPG
    - scsi_dh_alua: Allow workqueue to run synchronously
    - scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA'
    - scsi_dh_alua: Recheck state on unit attention
    - scsi_dh_alua: update all port states
    - scsi_dh_alua: Send TEST UNIT READY to poll for transitioning
    - scsi_dh_alua: do not fail for unknown VPD identification

linux (4.4.0-29.48) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1597015

  * Wireless hotkey fails on Dell XPS 15 9550 (LP: #1589886)
    - intel-hid: new hid event driver for hotkeys
    - intel-hid: fix incorrect entries in intel_hid_keymap
    - intel-hid: allocate correct amount of memory for private struct
    - intel-hid: add a workaround to ignore an event after waking up from S4.
    - [Config] CONFIG_INTEL_HID_EVENT=m

  * cgroupfs mounts can hang (LP: #1588056)
    - Revert "UBUNTU: SAUCE: (namespace) mqueue: Super blocks must be owned by the
      user ns which owns the ipc ns"
    - Revert "UBUNTU: SAUCE: kernfs: Do not match superblock in another user
      namespace when mounting"
    - Revert "UBUNTU: SAUCE: cgroup: Use a new super block when mounting in a
      cgroup namespace"
    - (namespace) bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
    - (namespace) bpf, inode: disallow userns mounts
    - (namespace) ipc: Initialize ipc_namespace->user_ns early.
    - (namespace) vfs: Pass data, ns, and ns->userns to mount_ns
    - SAUCE: (namespace) Sync with upstream s_user_ns patches
    - (namespace) kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC
    - (namespace) ipc/mqueue: The mqueue filesystem should never contain
      executables

  * KVM system crashes after starting guest (LP: #1596635)
    - xhci: Cleanup only when releasing primary hcd

  * Upstream patch "crypto: vmx - IV size failing on skcipher API" for Ubuntu
    16.04 (LP: #1596557)
    - crypto: vmx - IV size failing on skcipher API

  * [Bug]tpm initialization fails on x86 (LP: #1596469)
    - tpm_crb: drop struct resource res from struct crb_priv
    - tpm_crb: fix mapping of the buffers

  * Device shutdown notification for CAPI Flash cards (LP: #1592114)
    - cxlflash: Fix regression issue with re-ordering patch
    - cxlflash: Fix to drain operations from previous reset
    - cxlflash: Add device dependent flags
    - cxlflash: Shutdown notify support for CXL Flash cards

  * scsi-modules udeb should include pm80xx (LP: #1595628)
    - [Config] Add pm80xx scsi driver to d-i

  * Sync up latest relevant upstream bug fixes (LP: #1594871)
    - SAUCE: (noup) Update zfs to 0.6.5.6-0ubuntu10

  * Cannot compile module tda10071 (LP: #1592531)
    - [media] tda10071: Fix dependency to REGMAP_I2C

  * lsvpd doesn't show correct location code for devices attached to a CAPI card
    (LP: #1594847)
    - cxl: Make vPHB device node match adapter's

  * enable CRC32 and AES ARM64 by default or as module (LP: #1594455)
    - [Config] Enable arm64 AES and CRC32 crypto

  * VMX kernel crypto module exhibits poor performance in Ubuntu 16.04
    (LP: #1592481)
    - crypto: vmx - comply with ABIs that specify vrsave as reserved.
    - crypto: vmx - Fix ABI detection
    - crypto: vmx - Increase priority of aes-cbc cipher

  * build squashfs into xenial kernels by default (LP: #1593134)
    - [Config] CONFIG_SQUASHFS=y

  * Restore irqfd fast path for PPC (LP: #1592809)
    - KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts

  * Unable to start guests with memballoon default. (LP: #1592042)
    - virtio_balloon: fix PFN format for virtio-1

  * Key 5 automatically pressed on some Logitech wireless keyboards
    (LP: #1579190)
    - HID: core: prevent out-of-bound readings

  * ZFS: Running ztest repeatedly for long periods of time eventually results in
    "zdb: can't open 'ztest': No such file or directory" (LP: #1587686)
    - Fix ztest truncated cache file

  * STC840.20:Alpine:alp7fp1:Ubuntu 16.04, BlueFin (SAN) EEH 6 times during boot
    then disabled SRC BA188002:b0314a_1612.840 (LP: #1587316)
    - lpfc: Fix DMA faults observed upon plugging loopback connector

 -- Kamal Mostafa <kamal@xxxxxxxxxxxxx>  Tue, 12 Jul 2016 16:28:12 -0700

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1587316

Title:
  STC840.20:Alpine:alp7fp1:Ubuntu 16.04, BlueFin (SAN) EEH 6 times
  during boot then disabled SRC BA188002:b0314a_1612.840

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #0 - Application Cdeadmin <cdeadmin@xxxxxxxxxx> -
  2016-03-21 15:55:09 ==

  
  == Comment: #1 - Application Cdeadmin <cdeadmin@xxxxxxxxxx> - 2016-03-21 15:55:11 ==
  ==== State: Open by: mlfield on 21 March 2016 14:45:01 ====

  ==========================Automatic entries==========================
  Contact: LittleField, Michael *CONTRACTOR*
  Backup: Thirukumaran V T (Thirukumaran@xxxxxxxxxx), Deepti Umarani (deeptiumarani@xxxxxxxxxx), Brian M. Carpenter(carp@xxxxxxxxxx)

  ===== sys_capture v5.24 === 2016-03-21_14-25-41 ===========

  |
  |    |
  |    System Hardware Information:
  |      NODE /Sys-0/Node-0, U78C7.001.1AQH383-P2
  |         FSP  /Sys-0/Node-0/FSP-0, FSP-2 DD 1.0, U78C7.001.1AQH383-P1-C5
  |            PSI  /Sys-0/Node-0/FSP-0/PSI-0
  |            PSI  /Sys-0/Node-0/FSP-0/PSI-1
  |         MEMBUF /Sys-0/Node-0/Membuf-12, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C11
  |         MEMBUF /Sys-0/Node-0/Membuf-13, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C10
  |         MEMBUF /Sys-0/Node-0/Membuf-14, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C12
  |         MEMBUF /Sys-0/Node-0/Membuf-15, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C13
  |         MEMBUF /Sys-0/Node-0/Membuf-20, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C23
  |         MEMBUF /Sys-0/Node-0/Membuf-21, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C22
  |         MEMBUF /Sys-0/Node-0/Membuf-22, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C24
  |         MEMBUF /Sys-0/Node-0/Membuf-23, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C25
  |         MEMBUF /Sys-0/Node-0/Membuf-28, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C19
  |         MEMBUF /Sys-0/Node-0/Membuf-29, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C18
  |         MEMBUF /Sys-0/Node-0/Membuf-30, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C20
  |         MEMBUF /Sys-0/Node-0/Membuf-31, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C21
  |         MEMBUF /Sys-0/Node-0/Membuf-36, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C31
  |         MEMBUF /Sys-0/Node-0/Membuf-37, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C30
  |         MEMBUF /Sys-0/Node-0/Membuf-38, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C32
  |         MEMBUF /Sys-0/Node-0/Membuf-39, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C33
  |         MEMBUF /Sys-0/Node-0/Membuf-4, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C15
  |         MEMBUF /Sys-0/Node-0/Membuf-44, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C27
  |         MEMBUF /Sys-0/Node-0/Membuf-45, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C26
  |         MEMBUF /Sys-0/Node-0/Membuf-46, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C28
  |         MEMBUF /Sys-0/Node-0/Membuf-47, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C29
  |         MEMBUF /Sys-0/Node-0/Membuf-5, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C14
  |         MEMBUF /Sys-0/Node-0/Membuf-52, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C39
  |         MEMBUF /Sys-0/Node-0/Membuf-53, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C38
  |         MEMBUF /Sys-0/Node-0/Membuf-54, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C40
  |         MEMBUF /Sys-0/Node-0/Membuf-55, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C41
  |         MEMBUF /Sys-0/Node-0/Membuf-6, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C16
  |         MEMBUF /Sys-0/Node-0/Membuf-60, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C35
  |         MEMBUF /Sys-0/Node-0/Membuf-61, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C34
  |         MEMBUF /Sys-0/Node-0/Membuf-62, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C36
  |         MEMBUF /Sys-0/Node-0/Membuf-63, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C37
  |         MEMBUF /Sys-0/Node-0/Membuf-7, CENTAUR EC 2.0, U78C7.001.1AQH383-P2-C17
  |         PROC /Sys-0/Node-0/Proc-0, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C2
  |            CORE /Sys-0/Node-0/Proc-0/EX-12/Core-0
  |            CORE /Sys-0/Node-0/Proc-0/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-0/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-0/EX-4/Core-0
  |            PCI  /Sys-0/Node-0/Proc-0/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-0/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-0/PCI-2
  |            PSI  /Sys-0/Node-0/Proc-0/PSI-0
  |         PROC /Sys-0/Node-0/Proc-1, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C2
  |            CORE /Sys-0/Node-0/Proc-1/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-1/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-1/EX-4/Core-0
  |            CORE /Sys-0/Node-0/Proc-1/EX-5/Core-0
  |            PCI  /Sys-0/Node-0/Proc-1/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-1/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-1/PCI-2
  |         PROC /Sys-0/Node-0/Proc-2, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C3
  |            CORE /Sys-0/Node-0/Proc-2/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-2/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-2/EX-4/Core-0
  |            CORE /Sys-0/Node-0/Proc-2/EX-5/Core-0
  |            PCI  /Sys-0/Node-0/Proc-2/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-2/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-2/PCI-2
  |            PSI  /Sys-0/Node-0/Proc-2/PSI-0
  |         PROC /Sys-0/Node-0/Proc-3, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C3
  |            CORE /Sys-0/Node-0/Proc-3/EX-12/Core-0
  |            CORE /Sys-0/Node-0/Proc-3/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-3/EX-4/Core-0
  |            CORE /Sys-0/Node-0/Proc-3/EX-6/Core-0
  |            PCI  /Sys-0/Node-0/Proc-3/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-3/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-3/PCI-2
  |         PROC /Sys-0/Node-0/Proc-4, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C6
  |            CORE /Sys-0/Node-0/Proc-4/EX-12/Core-0
  |            CORE /Sys-0/Node-0/Proc-4/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-4/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-4/EX-6/Core-0
  |            PCI  /Sys-0/Node-0/Proc-4/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-4/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-4/PCI-2
  |         PROC /Sys-0/Node-0/Proc-5, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C6
  |            CORE /Sys-0/Node-0/Proc-5/EX-12/Core-0
  |            CORE /Sys-0/Node-0/Proc-5/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-5/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-5/EX-4/Core-0
  |            PCI  /Sys-0/Node-0/Proc-5/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-5/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-5/PCI-2
  |         PROC /Sys-0/Node-0/Proc-6, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C7
  |            CORE /Sys-0/Node-0/Proc-6/EX-12/Core-0
  |            CORE /Sys-0/Node-0/Proc-6/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-6/EX-4/Core-0
  |            CORE /Sys-0/Node-0/Proc-6/EX-5/Core-0
  |            PCI  /Sys-0/Node-0/Proc-6/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-6/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-6/PCI-2
  |         PROC /Sys-0/Node-0/Proc-7, P8/Murano DD 2.1, U78C7.001.1AQH383-P2-C7
  |            CORE /Sys-0/Node-0/Proc-7/EX-12/Core-0
  |            CORE /Sys-0/Node-0/Proc-7/EX-13/Core-0
  |            CORE /Sys-0/Node-0/Proc-7/EX-14/Core-0
  |            CORE /Sys-0/Node-0/Proc-7/EX-6/Core-0
  |            PCI  /Sys-0/Node-0/Proc-7/PCI-0
  |            PCI  /Sys-0/Node-0/Proc-7/PCI-1
  |            PCI  /Sys-0/Node-0/Proc-7/PCI-2
  |
  |    System Hardware Summary:
  |      Configured Proc Cores: 32
  |      Configured IO UNITs:   24
  |      Configured PCIe PHB:   24
  |      Installed Nodes:       1
  |
  |    Hardware InitFile Information:
  |        No tool support for FIRENZE
  |
  |    Hardware (CINI) Frequency Information:
  |        No tool support for FIRENZE
  |
  |    VPD Information:
  |      Backplane VPD:
  |        None found or VPD info is not available.
  |      VPD LID Information:
  |        VPD LID File [/opt/extucode/80e00040.lid]:
  |          VPD Keyword: [LX], Data: [3100050100300040]
  |        VPD LID File [/opt/extucode/80e00041.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300041]
  |        VPD LID File [/opt/extucode/80e00042.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300042]
  |        VPD LID File [/opt/extucode/80e00043.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300043]
  |        VPD LID File [/opt/extucode/80e00044.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300044]
  |        VPD LID File [/opt/extucode/80e00047.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300047]
  |             Format:        0x31   (1)
  |             Enclosure ID:  0x0004 (P8 HV (Tuleta))
  |             Server Type:   0x01   (i/pSeries)
  |             FRU Type:      0x00   (Backplane)
  |             VPD Pass:      0x30   (0)
  |             LID Name:      0x0047 (P8 Alpine xS4U)
  |        VPD LID File [/opt/extucode/80e00050.lid]:
  |          VPD Keyword: [LX], Data: [3100060100300050]
  |        VPD LID File [/opt/extucode/80e00051.lid]:
  |          VPD Keyword: [LX], Data: [3100060100300051]
  |        VPD LID File [/opt/extucode/80e00942.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300942]
  |        VPD LID File [/opt/extucode/80e00944.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300944]
  |        VPD LID File [/opt/extucode/80e00947.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300947]
  |             Format:        0x31   (1)
  |             Enclosure ID:  0x0004 (P8 HV (Tuleta))
  |             Server Type:   0x01   (i/pSeries)
  |             FRU Type:      0x00   (Backplane)
  |             VPD Pass:      0x30   (0)
  |             LID Name:      0x0947 (P8 Alpine Storage/Shark)
  |        VPD LID File [/opt/extucode/80e00ff0.lid]:
  |          VPD Keyword: [LX], Data: [3100040100300FF0]
  |
  |    WARNINGS:
  |      * Informational: This machine has signed firmware (ship image)
  |
  |    ERRL: Attempting to dump error logs using errl...
  |      Dumping all error logs on FSP to file...
  |      ERRL: The FSP stopped responding... skipping
  |
  |    FFDC:
  |      FNM: Attempting connection for basic health check...
  |        TimeSincePhypStarted=82:13:57.539
  |        No failed tasks found.
  |
  |      FNM: Attempting connection for PHYP FFDC...
  |          FNM PHYP FFDC data stored in /fspmount/alpine/alp7fp1/b0314a_1612.840/fsp/PHYP.FFDC.20160321142537.phyp
  |
  |      FipS MyFFDC: Was not attempted.  Reason:[Not requested]
  |
  |      Cronus: Data collection not attempted. (Unable to use Cronus via SSH Tunnel)
  |
  |----- File(s) Created During Capture ------
  |    SysCapture Primary LogFile: /fspmount/alpine/alp7fp1/b0314a_1612.840/fsp/PHYP.FFDC.20160321142537
  |    FNM PHYP FFDC stored in:    /fspmount/alpine/alp7fp1/b0314a_1612.840/fsp/PHYP.FFDC.20160321142537.phyp
  |
  ============== end of capture ==============

  ============================Manual entries===========================
  Title: STC840.20:Alpine:alp7fp1:Ubuntu 16.04, BlueFin (SAN) EEH 6 times during boot then disabled SRC BA188002:b0314a_1612.840

  Problem Description :
  Booting Ubuntu 16.04 with Blufin (SAN) and several other adapters, Bluefin EEH 6 times and then disabled, SRC BA188002 reported. All other adapters did not have any issues.

  ===================================END===============================
  ==== State: Open by: mlfield on 21 March 2016 14:47:26 ====

  Attached Dmesg Log: dmesg1.txt

  mlfield (mlfield@xxxxxxxxxx) added native attachment
  /opt/IBM/WebSphere/AppServer/profiles/cqweb/temp/ausratsrv5Node01/server1/TeamEAR/cqweb.war/dmesg1.txt
  on 2016-03-21 14:47:26

  == Comment: #2 - Application Cdeadmin <cdeadmin@xxxxxxxxxx> -
  2016-03-21 15:55:16 ==

  
  == Comment: #12 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-04-04 14:09:48 ==
  Info from Mike on ST.
  Assigned the adapter in the drawer to the LPAR, it hit the problem just like the adapter in the CEC.
  This points to a kernel/driver problem, since 14.04 didn't hit the problem.

  
  mlfield@xxxxxxxxxx - Michael Littlefield/Austin/Contr/IBM: just added both bluefins, its happen with both so MEX and CEC.
  # Slot                   Description                                               Device(s)
  U78C7.001.1AQH383-P1-C4  PCI-E capable, Rev 3, 16x lanes with 16x lanes connected  fibre-channel
                                                                                     fibre-channel
  U78C7.001.1AQH383-P1-C6  PCI-E capable, Rev 3, 8x lanes with 8x lanes connected    0000:60:00.1
                                                                                     0000:60:00.0
  U78CD.001.FZH0132-P1-C1  PCI-E capable, Rev 3, 16x lanes with 16x lanes connected  fibre-channel
                                                                                     fibre-channel
  U78CD.001.FZH0132-P2-C1  PCI-E capable, Rev 3, 16x lanes with 16x lanes connected  0002:50:00.0
  U78CD.001.FZH0132-P2-C3  PCI-E capable, Rev 3, 8x lanes with 8x lanes connected    0003:70:00.0
  U78CD.001.FZH0132-P2-C6  PCI-E capable, Rev 3, 8x lanes with 8x lanes connected    0004:a0:00.5
                                                                                     0004:a0:00.4
                                                                                     0004:a0:00.3
                                                                                     0004:a0:00.2
                                                                                     0004:a0:00.1
                                                                                     0004:a0:00.0

  == Comment: #16 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-04-12 18:00:26 ==
  Mike provided the LPAR for debugging earlier today.

  Observations.
  1) The NUMA nodes configuration is weird -- likely an effect of DLPAR of Memory/CPU.
  - node 0: has CPUs but has no memory
  - node 1: has CPUs and memory
  - node 6:  has no CPUs but has memory

  (0) root @ alp7p04: /root
  # numactl -H
  available: 3 nodes (0,2,6)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
  node 0 size: 0 MB
  node 0 free: 0 MB
  node 2 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
  node 2 size: 34216 MB
  node 2 free: 33248 MB
  node 6 cpus:
  node 6 size: 6644 MB
  node 6 free: 6568 MB
  node distances:
  node   0   2   6 
    0:  10  40  40 
    2:  40  10  40 
    6:  40  40  10 

  
  2) The problem does not reproduce with 14.04 kernel (4.2 from wily).

  Comparing the dmesg logs up to the point of failure, there are differences in the NUMA setup code.
  2a) A small offset difference in the NUMA DATA starting address. For example:

  16.04: [    0.000000] numa:   NODE_DATA [mem 0x9ffe46100-0x9ffe4ffff]

  14.04: [    0.000000] numa:   NODE_DATA [mem 0x9ffe45000-0x9ffe4ffff]

  2b) A *totally* different end address in the "Initmem setup node 0"

  16:04: [    0.000000] Initmem setup node 0 [mem
  0x0000000000000000-0x0000000000000000]

  14.04: [    0.000000] Initmem setup node 0 [mem
  0x0000000000000000-0xffffffffffffffff]


  In progress.
  I'll go through the NUMA setup code.

  == Comment: #20 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-04-12 18:18:52 ==
  Booting the 16.04 kernel with the numa=off boot option.
  The EEH errors still happen, but at a very later time (e.g., the 6th error/permanent failure happens only after the login prompt)

  == Comment: #22 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-04-13 10:23:33 ==
  (In reply to comment #16)
  > 2b) A *totally* different end address in the "Initmem setup node 0" 
  > 
  > 16:04: [    0.000000] Initmem setup node 0 [mem
  > 0x0000000000000000-0x0000000000000000]
  > 
  > 14.04: [    0.000000] Initmem setup node 0 [mem
  > 0x0000000000000000-0xffffffffffffffff]

  And this is the value on the original/reported dmesg attachment (on
  different NUMA node configuration, before some memory and CPUs were
  moved from this LPAR to another one):

  [Mon Mar 21 09:07:45 2016] Initmem setup node 0 [mem
  0x0000000000000000-0x00000078cfffffff]

  Notice it's non-zero as well as 14.04.. so not sure the NUMA
  differences have something directly related to this bug.

  == Comment: #27 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-05-18 19:47:05 ==
  Assigning this bug to Guilherme per EEH debugging experience and contacts.

  From what we've discussed, this problem doesn't seem to be specific to the lpfc device driver. 
  This same adapter/driver works fine on other systems (it has passed our FVT Regression testing w/out this problem).
  So, we suspect of some changes either in EEH / machine/platform-dependent code that is causing this, given that the 14.04 HWE kernel doesn't show this issue on this same LPAR.

  == Comment: #30 - Guilherme Guaglianoni Piccoli <gpiccoli@xxxxxxxxxx> - 2016-05-25 16:35:50 ==
  Quick update on this one: I'm investigating since Monday, and what I found is that in those cases of spontaneous EEH, the PCI BARs of the device are fulfilled with 0xFF, indicating some kind of corruption in adapter's memory.

  To dump the PCI BARs I firstly booted without EEH (by using eeh=off).
  The problem reproduces on kernel upstream v4.5, but not in v4.4 - so
  it seems a regression.

  I'm studying the commits between those revisions, making bisects,
  etc...so we can find which commits introduced this behavior.

  Thanks,

  
  Guilherme

  == Comment: #31 - Guilherme Guaglianoni Piccoli <gpiccoli@xxxxxxxxxx> - 2016-05-27 18:59:09 ==
  Offending commit was found after doing some bisect and analysis on upstream kernel:

  d6de08cc462 ("lpfc: Fix the FLOGI discovery logic to comply with T11
  standards")

  When this comment was reverted in kernel 4.6, the problem disappeared.
  I do see some FLOGI failure on dmesg, but I guess this is somewhat normal (reference: https://access.redhat.com/solutions/400483);

  
  Now, next step is to investigate what's going on with this commit; it should has been tested before it was merged, so this could be a non-expected corner case we're experiencing. I guess Maur?cio's opinion would be really useful here, since he has much expertise in Fiber Channel devices (he should be back on next week's beginning).

  
  One more thought: it's important to determine what is the real priority of this bug, meaning if this is a stop ship or the impact on some release would be critical, we could ask Canonical to revert it until a proper fix be implemented. Guess Brian, Mauricio and Breno's opinion on this are valuable.

  Thanks,

  
  Guilherme

  == Comment: #32 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-05-30 10:13:57 ==
  Guilherme,

  Thank you very much for the precise handling on this one. Reassigning
  it back to myself.

  I wouldn't imagine this was a driver specific problem, but given your
  pointer to this commit, it's indeed something in that direction -- the
  dmesg log confirm there's some involvement of the FLOGI (fabric login)
  steps (related to the mentioned commit)

  The devices have 2 ports (eg, PCI functions 0 and 1). 
  - Function 0 is processed first -- probe finishes OK, and it starts FLOGI steps. 
  - Function 1 starts probe during Function 0's FLOGI steps -- and Function 1 probe fails on with the EEH.

  So, the change in the FLOGI logic seems to be quite involved in the
  problems sensed by the mailbox commands that result in the EEH.

  More on this later.

  [    1.215858] lpfc 0001:01:00.0: enabling device (0144 -> 0146)
  ...
  [    2.143487] lpfc 0001:01:00.1: enabling device (0144 -> 0146)
  ...
  [    2.636592] lpfc 0001:01:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
  [    2.638459] lpfc 0001:01:00.0: 0:(0):2858 FLOGI failure Status:x3/x103 TMO:x14 Data x1800 x0
  [    2.638464] lpfc 0001:01:00.0: 0:(0):0100 FLOGI failure Status:x3/x103 TMO:x14
  [    2.639019] EEH: Frozen PHB#1-PE#10000 detected
  ...
  [    2.639049] [c00000084f612ee0] [c000000000037a84] eeh_check_failure+0x84/0xd0
  [    2.639061] [c00000084f612f20] [d000000008ed3cc4] lpfc_sli4_wait_bmbx_ready+0x114/0x150 [lpfc]
  ...
  [    2.639086] [c00000084f6131c0] [d000000008ee7780] lpfc_cq_create+0x210/0x370 [lpfc].
  ...
  [    2.639113] [c00000084f613550] [d000000008f23a28] lpfc_pci_probe_one+0x1248/0x13d0 [lpfc]
  [    2.639117] [c00000084f6135f0] [c0000000005daefc] local_pci_probe+0x6c/0x140
  ...
  [    2.639158] lpfc 0001:01:00.1: 1:(0):2544 Mailbox command x9b (x1/xc) cannot issue Data: x200 x1
  ...
  [    2.639166] lpfc 0001:01:00.1: 1:2501 CQ_CREATE mailbox failed with status x0 add_status x0, mbx status xff
  ...

  == Comment: #33 - Guilherme Guaglianoni Piccoli <gpiccoli@xxxxxxxxxx> - 2016-05-30 12:56:21 ==
  Thanks Maur?cio!

  I noticed compiling kernel both with the commit and without it (by
  reverting it), the following if is taken on lpfc_mbox_dev_check() :

  if (phba->link_state == LPFC_HBA_ERROR)

  So, in both cases the link_state is off but the commit perhaps introduced some order re-arrangement in the way it cannot handle anymore with this fail, maybe because of a race condition between threads.
  This conclusion came from the following snippet of commit message:

  "Required reworking the call sequence in the discovery threads."

  
  Thanks for taking from now.
  Cheers,

  
  Guilherme

  == Comment: #34 - Breno Henrique Leitao <brenohl@xxxxxxxxxx> - 2016-05-30 13:25:00 ==
  > we could ask Canonical to revert it until a proper fix be
  > implemented. Guess Brian, Mauricio and Breno's opinion on this are valuable.

  Well, it will not be simple to ask them to revert it. Although we
  requested the lpfc package upgrade [via bug #132388], there was
  another request to do so (LP: #1541592), so, I would suggest trying to
  propose a fix, other than asking to revert this commit.

  Does it make sense?

  == Comment: #35 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-05-30 14:17:25 ==
  It seems this commit might fix the problem. I'm working on a  build  with it.

  ae09c765109293b600ba9169aa3d632e1ac1a843
  lpfc: Fix DMA faults observed upon plugging loopback connector

  Driver didn't program the REG_VFI mailbox correctly, giving the adapter
  bad addresses.

  == Comment: #36 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> - 2016-05-30 17:35:30 ==
  Hi Canonical,

  Can you please apply this fix for the lpfc driver?

  This upstream commit fixes the problem:

  	ae09c765109293b600ba9169aa3d632e1ac1a843
  	lpfc: Fix DMA faults observed upon plugging loopback connector

  Original kernel (4.4.0-22.40)

  	root@alp7p04:~# uname -a
  	Linux alp7p04 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:35 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

  	root@alp7p04:~# dmesg | grep -i eeh
  	[    0.051252] EEH: pSeries platform initialized
  	[    0.137050] EEH: devices created
  	[    0.167121] EEH: PCI Enhanced I/O Error Handling Enabled
  	[    3.039195] EEH: Frozen PHB#3-PE#10000 detected
  	[    3.039211] EEH: PE location: N/A, PHB location: N/A
  	[    3.039234] [c00000062fa16e40] [c0000000000379b4] eeh_dev_check_failure+0x534/0x580
  	[    3.039237] [c00000062fa16ee0] [c000000000037a84] eeh_check_failure+0x84/0xd0
  	[    3.039398] EEH: Detected PCI bus error on PHB#3-PE#10000
  	<...>

  Patched kernel (4.4.0-22.40 + patch)

  	root@alp7p04:~# uname -a
  	Linux alp7p04 4.4.0-22-generic #40+bz139414c35 SMP Mon May 30 10:54:04 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux

  	root@alp7p04:~# dmesg | grep -i eeh
  	[    0.051222] EEH: pSeries platform initialized
  	[    0.137348] EEH: devices created
  	[    0.167359] EEH: PCI Enhanced I/O Error Handling Enabled
  	root@alp7p04:~#

  == Comment: #38 - Mauricio Faria De Oliveira <mauricfo@xxxxxxxxxx> -
  2016-05-30 17:42:13 ==

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1587316/+subscriptions