← Back to team overview

kernel-packages team mailing list archive

[Bug 1574697] Re: WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN]

 

** Also affects: linux (Ubuntu Wily)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Wily)
       Status: New => In Progress

** Changed in: linux (Ubuntu Wily)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1574697

Title:
  WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595
  [travis3EN]

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Wily:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  ---Problem Description---
  WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN]
   
  ---uname output---
  Linux ltciofvtr-s822l2-lp3 4.4.0-4-generic #19-Ubuntu SMP Fri Feb 5 17:36:21 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = s822l 
   
  ---Steps to Reproduce---
   triggering EEH causes the warning messages in syslog
  Note: its just the warning messages, card recovers after EEH

  1. from peer: run some load
  linux-xqxs:~ # ping -f 22.22.22.22

  2. from pKVM host run the EEH for the travis3EN card
  [root@ltciofvtr-s822l2-lp1 ~]# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA; sleep 1; echo 0x0 > /sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA

  3. on client's sysfs you can see the warning messages "WARNING: at
  /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595"

  [  940.382507] EEH: Frozen PHB#0-PE#1 detected
  [  940.382594] EEH: PE location: N/A, PHB location: N/A
  [  940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called
  [  940.382891] mlx4_core 0000:00:04.0: device is going to be reset
  [  940.382953] mlx4_core 0000:00:04.0: device was reset successfully
  [  940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382507] EEH: Frozen PHB#0-PE#1 detected
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382594] EEH: PE location: N/A, PHB location: N/A
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382647] CPU: 1 PID: 176 Comm: kworker/u16:2 Not tainted 4.4.0-4-generic #19-Ubuntu
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382671] Workqueue: mlx4_en mlx4_en_do_get_stats [mlx4_en]
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382673] Call Trace:
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382714] [c00000000487b7c0] [c000000000ad8aa0] dump_stack+0x90/0xbc (unreliable)
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382725] [c00000000487b7f0] [c0000000000378f4] eeh_dev_check_failure+0x534/0x580
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382728] [c00000000487b890] [c0000000000379c4] eeh_check_failure+0x84/0xd0
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382743] [c00000000487b8d0] [d000000002112fc0] cmd_pending+0xb0/0xe0 [mlx4_core]
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382749] [c00000000487b900] [d0000000021130b0] mlx4_cmd_post+0xc0/0x250 [mlx4_core]
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382756] [c00000000487b9b0] [d00000000211592c] __mlx4_cmd+0x1dc/0x9b0 [mlx4_core]
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382766] [c00000000487ba70] [d0000000024eb030] mlx4_en_DUMP_ETH_STATS+0xc0/0x830 [mlx4_en]
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382770] [c00000000487bb70] [d0000000024ef150] mlx4_en_do_get_stats+0x160/0x340 [mlx4_en]
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382780] [c00000000487bc50] [c0000000000dc920] process_one_work+0x1e0/0x560
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382783] [c00000000487bce0] [c0000000000dce34] worker_thread+0x194/0x680
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382785] [c00000000487bd80] [c0000000000e58d0] kthread+0x110/0x130
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382788] [c00000000487be30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382814] mlx4_core 0000:00:04.0: Could not post command 0x49: ret=-5, in_param=0x0, in_mod=0x1, op_mod=0x0
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382821] EEH: Detected PCI bus error on PHB#0-PE#1
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382823] EEH: This PCI device has failed 1 times in the last hour
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382824] EEH: Notify device drivers to shutdown
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382891] mlx4_core 0000:00:04.0: device is going to be reset
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382953] mlx4_core 0000:00:04.0: device was reset successfully
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.383320] mlx4_en: enp0s4: Close port called
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Starting Cleanup of Temporary Directories...
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd-tmpfiles[2473]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Started Cleanup of Temporary Directories.
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.801690] mlx4_en 0000:00:04.0: removed PHC
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.079593] EEH: Collect temporary log
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.079631] eeh_pci_enable: Unexpected state change 2 on PHB#0-PE#1, err=-3
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081329] EEH: of node=0000:00:04:0
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081348] EEH: PCI device/vendor: 100315b3
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081582] EEH: PCI cmd/status register: 00100142
  Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081584] EEH: PCI-E capabilities and status follow:
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081725] EEH: PCI-E 00: 0002c010 11d08e02 0020202e 0843f483 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081849] EEH: PCI-E 10: 10830000 00000000 00000000 00000000 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081851] EEH: PCI-E 20: 00000000 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081886] EEH: Reset without hotplug activity
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081935] mlx4_core 0000:00:04.0: mlx4_remove_one: interface is down
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082003] mlx4_core 0000:00:04.0: disabling already-disabled device
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082046] ------------[ cut here ]------------
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082049] WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082051] Modules linked in: ib_ipoib mlx5_ib mlx5_core rdma_ucm rdma_cm iw_cm ib_umad ib_ucm ib_cm ib_sa ib_mad ib_uverbs ib_core ib_addr pseries_rng rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel ibmvscsi mlx4_core
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082257] CPU: 1 PID: 49 Comm: eehd Not tainted 4.4.0-4-generic #19-Ubuntu
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082260] task: c0000003f91e9370 ti: c0000003f9060000 task.ti: c0000003f9060000
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082263] NIP: c0000000005cdf0c LR: c0000000005cdf08 CTR: c00000000057ae00
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082265] REGS: c0000003f9063560 TRAP: 0700   Not tainted  (4.4.0-4-generic)
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082267] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28002422  XER: 20000000
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] CFAR: c000000000ad578c SOFTE: 1 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR00: c0000000005cdf08 c0000003f90637e0 c000000001593900 0000000000000039 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR04: 0000000000000001 0000000000000000 0000000000000048 0000000000000175 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR08: c000000001733900 0000000000000000 0000000000000000 0000000000000005 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR12: 0000000028002428 c00000000fb40980 c0000000000e57c8 c0000003fe165980 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d1f500 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR24: c000000000d1f4d8 0000000000000100 c0000003fe058580 0000000000000000 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR28: c0000003fe144000 c000000004fd0300 c0000003fe144758 c0000003fe144000 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082328] NIP [c0000000005cdf0c] pci_disable_device+0x11c/0x140
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082332] LR [c0000000005cdf08] pci_disable_device+0x118/0x140
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082333] Call Trace:
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082337] [c0000003f90637e0] [c0000000005cdf08] pci_disable_device+0x118/0x140 (unreliable)
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082350] [c0000003f9063850] [d00000000212b0d4] mlx4_remove_one+0xc4/0x250 [mlx4_core]
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082353] [c0000003f90638e0] [c0000000005d2fc0] pci_device_remove+0x70/0x110
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082358] [c0000003f9063920] [c0000000006be740] __device_release_driver+0xc0/0x190
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082362] [c0000003f9063950] [c0000000006be850] device_release_driver+0x40/0x70
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082365] [c0000003f9063980] [c0000000005c7e30] pci_stop_bus_device+0xf0/0x110
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082368] [c0000003f90639c0] [c0000000005c7fbc] pci_stop_and_remove_bus_device+0x2c/0x50
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082372] [c0000003f90639f0] [c00000000003c100] eeh_rmv_device+0x140/0x1a0
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082375] [c0000003f9063a70] [c00000000003a294] eeh_pe_dev_traverse+0x94/0x160
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082380] [c0000003f9063b00] [c000000000ad39d0] eeh_reset_device+0xbc/0x218
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082383] [c0000003f9063ba0] [c00000000003c454] eeh_handle_normal_event+0x2f4/0x430
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082386] [c0000003f9063c20] [c00000000003c764] eeh_handle_event+0x54/0x360
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082389] [c0000003f9063cd0] [c00000000003cb8c] eeh_event_handler+0x11c/0x1e0
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082393] [c0000003f9063d80] [c0000000000e58d0] kthread+0x110/0x130
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082397] [c0000003f9063e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082399] Instruction dump:
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082401] 409eff64 387f0098 480eab45 60000000 e8bf00e8 2fa50000 7c641b78 419e0028 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082407] 3c62ff7f 38633aa8 48507821 60000000 <0fe00000> 39200001 3d42fff8 992a1acb 
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082413] ---[ end trace 1cce98b956e06602 ]---
  Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082431] iommu: Removing device 0000:00:04.0 from group 0
  Feb 19 02:18:28 ltciofvtr-s822l2-lp3 kernel: [  945.197931] EEH: Sleep 5s ahead of partial hotplug
  Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [  950.204919] iommu: Adding device 0000:00:04.0 to group 0
  Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [  950.205129] mlx4_core: Initializing 0000:00:04.0
  Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [  950.207395] mlx4_core 0000:00:04.0: Using 64-bit direct DMA at offset 800000000000000
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.254212] mlx4_core 0000:00:04.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.254215] mlx4_core 0000:00:04.0: PCIe link width is x8, device supports x8
  [  955.356803] mlx4_en: 0000:00:04.0: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.353773] mlx4_en 0000:00:04.0: Activating port:1
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.356795] mlx4_en: 0000:00:04.0: Port 1: Using 64 TX rings
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.356800] mlx4_en: 0000:00:04.0: Port 1: Using 8 RX rings
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.356803] mlx4_en: 0000:00:04.0: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.359817] mlx4_en: 0000:00:04.0: Port 1: Initializing port
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.360278] mlx4_en 0000:00:04.0: registered PHC clock
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.361113] mlx4_en 0000:00:04.0: Activating port:2
  [  955.365352] mlx4_en: 0000:00:04.0: Port 2:   frag:0 - size:1522 prefix:0 stride:1536
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.363940] mlx4_core 0000:00:04.0 enp0s4: renamed from eth0
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.365347] mlx4_en: 0000:00:04.0: Port 2: Using 64 TX rings
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.365350] mlx4_en: 0000:00:04.0: Port 2: Using 8 RX rings
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.365352] mlx4_en: 0000:00:04.0: Port 2:   frag:0 - size:1522 prefix:0 stride:1536
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.380726] mlx4_en: 0000:00:04.0: Port 2: Initializing port
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.386733] EEH: Notify device drivers the completion of reset
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.386737] EEH: Notify device driver to resume
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.408991] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.410735] mlx4_core 0000:00:04.0 enp0s4d1: renamed from eth0
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.411687] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
  Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.411690] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1
  Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [  957.608097] mlx4_en: enp0s4d1: Link Up
  Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [  957.662997] mlx4_en: enp0s4: Link Up

  
  pKVM syslog:
  Feb 19 18:16:47 ltciofvtr-s822l2-lp1 kernel: vfio-pci 0003:0b:00.0: enabling dev
  ice (0140 -> 0142)
  Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Starting Session 1302 of user root
  .
  Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Started Session 1302 of user root.
  Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Failed to reset devices.list on /m
  achine.slice: Invalid argument
   

  The patches are finally upstream:
  https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=c12833acff62cff83a8b728253e7ebbc1264d75e
  From c12833acff62cff83a8b728253e7ebbc1264d75e Mon Sep 17 00:00:00 2001
  From: Daniel Jurgens <danielj@xxxxxxxxxxxx>
  Date: Wed, 20 Apr 2016 16:01:15 +0300
  Subject: net/mlx4_core: Implement pci_resume callback

  https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=4bfd2e6e53435a214888fd35e230157a38ffc6a0
  From 4bfd2e6e53435a214888fd35e230157a38ffc6a0 Mon Sep 17 00:00:00 2001
  From: Daniel Jurgens <danielj@xxxxxxxxxxxx>
  Date: Wed, 20 Apr 2016 16:01:16 +0300
  Subject: net/mlx4_core: Avoid repeated calls to pci enable/disable

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574697/+subscriptions