← Back to team overview

kernel-packages team mailing list archive

[Bug 1419938] Comment bridged from LTC Bugzilla

 

------- Comment From clsoto@xxxxxxxxxx 2015-03-11 19:14 EDT-------
This looks fixed with  3.19.0-8-generic #8-Ubuntu
it was able to recover from EEH.

[ 2694.622586] EEH: Notify device drivers to shutdown
[ 2694.622587] mlx4_core 0004:01:00.0: device was reset successfully
[ 2694.622589] mlx4_core 0004:01:00.0: mlx4_pci_err_detected was called
[ 2694.622594] mlx4_en 0004:01:00.0: Internal error detected, restarting device
[ 2694.622786] mlx4_en: eth14: Close port called
[ 2694.846830] mlx4_en 0004:01:00.0: removed PHC
[ 2694.874036] EEH: Collect temporary log
[ 2694.879101] EEH: of node=/pciex@3fffe42000000/pci@0/ethernet@0
[ 2694.879465] EEH: PCI device/vendor: 100715b3
[ 2694.879478] EEH: PCI cmd/status register: 00100142
[ 2694.879479] EEH: PCI-E capabilities and status follow:
[ 2694.879544] EEH: PCI-E 00: 00020010 10008e02 0020204e 0843f483
[ 2694.879597] EEH: PCI-E 10: 10830040 00000000 00000000 00000000
[ 2694.879598] EEH: PCI-E 20: 00000000
[ 2694.879599] EEH: PCI-E AER capability register set follows:
[ 2694.879666] EEH: PCI-E AER 00: 18c20001 00000000 00000000 00062010
[ 2694.879719] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[ 2694.879772] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 2694.879785] EEH: PCI-E AER 30: 00000000 00000000
[ 2694.879787] PHB3 PHB#4 Diag-data (Version: 1)
[ 2694.879789] brdgCtl:     00000002
[ 2694.879790] UtlSts:      00200000 00000000 00000000
[ 2694.879791] RootSts:     00000040 00400000 f0830048 00100147 00000000
[ 2694.879792] PhbSts:      0000001c00000000 0000001c00000000
[ 2694.879793] Lem:         0000000000100000 42498e327f502eae 0000000000000000
[ 2694.879795] InAErr:      8000000000000000 8000000000000000 0402008000000000 0000000000000000
[ 2694.879796] PE[  1] A/B: 8480002b00000000 8000000000000000
[ 2694.879797] PE[  2] A/B: 8000000000000000 8000000000000000
[ 2694.879798] PE[  3] A/B: 8000000000000000 8000000000000000
[ 2694.879799] PE[  4] A/B: 8000000000000000 8000000000000000
[ 2694.879800] PE[  5] A/B: 8000000000000000 8000000000000000
[ 2694.879801] EEH: Reset without hotplug activity
[ 2698.898176] EEH: Notify device drivers the completion of reset
[ 2698.898181] mlx4_core 0004:01:00.0: mlx4_pci_slot_reset was called
[ 2698.898218] mlx4_core 0004:01:00.0: enabling device (0140 -> 0142)
[ 2705.396286] mlx4_core 0004:01:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
[ 2705.396288] mlx4_core 0004:01:00.0: PCIe link width is x8, device supports x8
[ 2706.143789] mlx4_en 0004:01:00.0: registered PHC clock
[ 2706.143864] mlx4_en 0004:01:00.0: Activating port:1
[ 2706.159496] mlx4_en: eth11: Using 256 TX rings
[ 2706.159504] mlx4_en: eth11: Using 8 RX rings
[ 2706.159506] mlx4_en: eth11:   frag:0 - size:1518 prefix:0 stride:1536
[ 2706.159722] mlx4_en: eth11: Initializing port
[ 2706.160022] mlx4_en 0004:01:00.0: Activating port:2
[ 2706.165214] mlx4_core 0004:01:00.0 eth14: renamed from eth11
[ 2706.188419] mlx4_en: eth11: Using 256 TX rings
[ 2706.188427] mlx4_en: eth11: Using 8 RX rings
[ 2706.188430] mlx4_en: eth11:   frag:0 - size:1518 prefix:0 stride:1536
[ 2706.188660] mlx4_en: eth11: Initializing port
[ 2706.197316] EEH: Notify device driver to resume
[ 2706.525987] mlx4_core 0004:01:00.0 eth16: renamed from eth11
[ 2707.487156] mlx4_en: eth14: Link Up
[ 2707.542052] mlx4_en: eth16: Link Up

thanks.

------- Comment From clsoto@xxxxxxxxxx 2015-03-11 19:16 EDT-------
Ups wrong comment to wrong bugzilla. Please disregard comment #22.

I will verify this tomorrow.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1419938

Title:
  hang in mlx5_create_map_eq in Ubuntu 15.04 due to not getting
  interrupts (mlx5) (Mellanox)

Status in linux package in Ubuntu:
  Fix Released

Bug description:
  ---Problem Description---
  While installing Ubuntu 15.04 LE in powerNV system, I see the following error:

  
  [  242.141309] INFO: task systemd-udevd:623 blocked for more than 120 seconds.
  [  242.141408]       Tainted: G            E  3.18.0-12-generic #13-Ubuntu
  [  242.141463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [  242.141529] systemd-udevd   D 00003fff8ad27d24     0   623    603 0x00040000
  [  242.141597] Call Trace:
  [  242.141625] [c000002fe2dbad50] [0000000500000000] 0x500000000 (unreliable)

  
  Mellanox Error from trace:

  [    2.609984] /build/buildd/linux-3.18.0/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
  [    2.611142] Freeing unused kernel memory: 5760K (c000000000d90000 - c000000001330000)
  starting version 218
  [    2.664785] scsi_transport_fc: module verification failed: signature and/or  required key missing - tainting kernel
  [    2.668055] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
  [    2.668124] mlx4_core: Initializing 0000:01:00.0
   

  ---uname output---
  Linux powerio-le21 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:55:08 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  Mellanox device which seems to be causing the error:
  0000:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
   

   
  Machine Type = 8286-42A PowerNV 
   
  ---Debugger---
  ---Steps to Reproduce---
  1. Start install of Ubuntu 15.04 LE
  2. Installer does not start at all. We see the error
    
  Install ISO Information: Ubuntu 15.04 - vivid-server-ppc64el.iso
   
  Install method: DVD
   
  Install disk info: # ethtool -i  eth17                
  driver: mlx4_en
  version: 2.2-1 (Feb 2014)
  firmware-version: 2.9.1326
  bus-info: 0000:01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: no
  supports-register-dump: no
  supports-priv-flags: yes
   
  The issue is that we are not getting interrupts. 

  I forced to install Mellanox OFED in my virtual guest with ubuntu
  15.04 and I do not see the issue with that code so I will try to look
  tonight for differences between that code and upstream to see if I can
  spot the issue.

  The problem is related to the following commit:

  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/drivers/net/ethernet/mellanox/mlx5/core?id=c7a08ac7ee68b9af0d5af99c7b34b574cac4d144

  They forgot to set the page size for UAR to adapter so that is why is
  not working. So any kernel for power that gets that patch in mlx5 will
  see this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1419938/+subscriptions