kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #174612
[Bug 1574697] Re: WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN]
** Also affects: linux (Ubuntu Wily)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Wily)
Status: New => In Progress
** Changed in: linux (Ubuntu Wily)
Assignee: (unassigned) => Tim Gardner (timg-tpi)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1574697
Title:
WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595
[travis3EN]
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Wily:
In Progress
Status in linux source package in Xenial:
In Progress
Status in linux source package in Yakkety:
Fix Released
Bug description:
---Problem Description---
WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN]
---uname output---
Linux ltciofvtr-s822l2-lp3 4.4.0-4-generic #19-Ubuntu SMP Fri Feb 5 17:36:21 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = s822l
---Steps to Reproduce---
triggering EEH causes the warning messages in syslog
Note: its just the warning messages, card recovers after EEH
1. from peer: run some load
linux-xqxs:~ # ping -f 22.22.22.22
2. from pKVM host run the EEH for the travis3EN card
[root@ltciofvtr-s822l2-lp1 ~]# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA; sleep 1; echo 0x0 > /sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA
3. on client's sysfs you can see the warning messages "WARNING: at
/build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595"
[ 940.382507] EEH: Frozen PHB#0-PE#1 detected
[ 940.382594] EEH: PE location: N/A, PHB location: N/A
[ 940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called
[ 940.382891] mlx4_core 0000:00:04.0: device is going to be reset
[ 940.382953] mlx4_core 0000:00:04.0: device was reset successfully
[ 940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382507] EEH: Frozen PHB#0-PE#1 detected
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382594] EEH: PE location: N/A, PHB location: N/A
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382647] CPU: 1 PID: 176 Comm: kworker/u16:2 Not tainted 4.4.0-4-generic #19-Ubuntu
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382671] Workqueue: mlx4_en mlx4_en_do_get_stats [mlx4_en]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382673] Call Trace:
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382714] [c00000000487b7c0] [c000000000ad8aa0] dump_stack+0x90/0xbc (unreliable)
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382725] [c00000000487b7f0] [c0000000000378f4] eeh_dev_check_failure+0x534/0x580
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382728] [c00000000487b890] [c0000000000379c4] eeh_check_failure+0x84/0xd0
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382743] [c00000000487b8d0] [d000000002112fc0] cmd_pending+0xb0/0xe0 [mlx4_core]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382749] [c00000000487b900] [d0000000021130b0] mlx4_cmd_post+0xc0/0x250 [mlx4_core]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382756] [c00000000487b9b0] [d00000000211592c] __mlx4_cmd+0x1dc/0x9b0 [mlx4_core]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382766] [c00000000487ba70] [d0000000024eb030] mlx4_en_DUMP_ETH_STATS+0xc0/0x830 [mlx4_en]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382770] [c00000000487bb70] [d0000000024ef150] mlx4_en_do_get_stats+0x160/0x340 [mlx4_en]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382780] [c00000000487bc50] [c0000000000dc920] process_one_work+0x1e0/0x560
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382783] [c00000000487bce0] [c0000000000dce34] worker_thread+0x194/0x680
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382785] [c00000000487bd80] [c0000000000e58d0] kthread+0x110/0x130
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382788] [c00000000487be30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382814] mlx4_core 0000:00:04.0: Could not post command 0x49: ret=-5, in_param=0x0, in_mod=0x1, op_mod=0x0
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382821] EEH: Detected PCI bus error on PHB#0-PE#1
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382823] EEH: This PCI device has failed 1 times in the last hour
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382824] EEH: Notify device drivers to shutdown
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382891] mlx4_core 0000:00:04.0: device is going to be reset
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382953] mlx4_core 0000:00:04.0: device was reset successfully
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.383320] mlx4_en: enp0s4: Close port called
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Starting Cleanup of Temporary Directories...
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd-tmpfiles[2473]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Started Cleanup of Temporary Directories.
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.801690] mlx4_en 0000:00:04.0: removed PHC
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.079593] EEH: Collect temporary log
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.079631] eeh_pci_enable: Unexpected state change 2 on PHB#0-PE#1, err=-3
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081329] EEH: of node=0000:00:04:0
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081348] EEH: PCI device/vendor: 100315b3
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081582] EEH: PCI cmd/status register: 00100142
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081584] EEH: PCI-E capabilities and status follow:
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081725] EEH: PCI-E 00: 0002c010 11d08e02 0020202e 0843f483
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081849] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081851] EEH: PCI-E 20: 00000000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081886] EEH: Reset without hotplug activity
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081935] mlx4_core 0000:00:04.0: mlx4_remove_one: interface is down
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082003] mlx4_core 0000:00:04.0: disabling already-disabled device
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082046] ------------[ cut here ]------------
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082049] WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082051] Modules linked in: ib_ipoib mlx5_ib mlx5_core rdma_ucm rdma_cm iw_cm ib_umad ib_ucm ib_cm ib_sa ib_mad ib_uverbs ib_core ib_addr pseries_rng rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel ibmvscsi mlx4_core
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082257] CPU: 1 PID: 49 Comm: eehd Not tainted 4.4.0-4-generic #19-Ubuntu
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082260] task: c0000003f91e9370 ti: c0000003f9060000 task.ti: c0000003f9060000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082263] NIP: c0000000005cdf0c LR: c0000000005cdf08 CTR: c00000000057ae00
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082265] REGS: c0000003f9063560 TRAP: 0700 Not tainted (4.4.0-4-generic)
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082267] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28002422 XER: 20000000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] CFAR: c000000000ad578c SOFTE: 1
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR00: c0000000005cdf08 c0000003f90637e0 c000000001593900 0000000000000039
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR04: 0000000000000001 0000000000000000 0000000000000048 0000000000000175
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR08: c000000001733900 0000000000000000 0000000000000000 0000000000000005
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR12: 0000000028002428 c00000000fb40980 c0000000000e57c8 c0000003fe165980
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d1f500
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR24: c000000000d1f4d8 0000000000000100 c0000003fe058580 0000000000000000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR28: c0000003fe144000 c000000004fd0300 c0000003fe144758 c0000003fe144000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082328] NIP [c0000000005cdf0c] pci_disable_device+0x11c/0x140
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082332] LR [c0000000005cdf08] pci_disable_device+0x118/0x140
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082333] Call Trace:
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082337] [c0000003f90637e0] [c0000000005cdf08] pci_disable_device+0x118/0x140 (unreliable)
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082350] [c0000003f9063850] [d00000000212b0d4] mlx4_remove_one+0xc4/0x250 [mlx4_core]
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082353] [c0000003f90638e0] [c0000000005d2fc0] pci_device_remove+0x70/0x110
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082358] [c0000003f9063920] [c0000000006be740] __device_release_driver+0xc0/0x190
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082362] [c0000003f9063950] [c0000000006be850] device_release_driver+0x40/0x70
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082365] [c0000003f9063980] [c0000000005c7e30] pci_stop_bus_device+0xf0/0x110
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082368] [c0000003f90639c0] [c0000000005c7fbc] pci_stop_and_remove_bus_device+0x2c/0x50
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082372] [c0000003f90639f0] [c00000000003c100] eeh_rmv_device+0x140/0x1a0
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082375] [c0000003f9063a70] [c00000000003a294] eeh_pe_dev_traverse+0x94/0x160
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082380] [c0000003f9063b00] [c000000000ad39d0] eeh_reset_device+0xbc/0x218
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082383] [c0000003f9063ba0] [c00000000003c454] eeh_handle_normal_event+0x2f4/0x430
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082386] [c0000003f9063c20] [c00000000003c764] eeh_handle_event+0x54/0x360
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082389] [c0000003f9063cd0] [c00000000003cb8c] eeh_event_handler+0x11c/0x1e0
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082393] [c0000003f9063d80] [c0000000000e58d0] kthread+0x110/0x130
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082397] [c0000003f9063e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082399] Instruction dump:
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082401] 409eff64 387f0098 480eab45 60000000 e8bf00e8 2fa50000 7c641b78 419e0028
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082407] 3c62ff7f 38633aa8 48507821 60000000 <0fe00000> 39200001 3d42fff8 992a1acb
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082413] ---[ end trace 1cce98b956e06602 ]---
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082431] iommu: Removing device 0000:00:04.0 from group 0
Feb 19 02:18:28 ltciofvtr-s822l2-lp3 kernel: [ 945.197931] EEH: Sleep 5s ahead of partial hotplug
Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [ 950.204919] iommu: Adding device 0000:00:04.0 to group 0
Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [ 950.205129] mlx4_core: Initializing 0000:00:04.0
Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [ 950.207395] mlx4_core 0000:00:04.0: Using 64-bit direct DMA at offset 800000000000000
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.254212] mlx4_core 0000:00:04.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.254215] mlx4_core 0000:00:04.0: PCIe link width is x8, device supports x8
[ 955.356803] mlx4_en: 0000:00:04.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.353773] mlx4_en 0000:00:04.0: Activating port:1
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.356795] mlx4_en: 0000:00:04.0: Port 1: Using 64 TX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.356800] mlx4_en: 0000:00:04.0: Port 1: Using 8 RX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.356803] mlx4_en: 0000:00:04.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.359817] mlx4_en: 0000:00:04.0: Port 1: Initializing port
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.360278] mlx4_en 0000:00:04.0: registered PHC clock
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.361113] mlx4_en 0000:00:04.0: Activating port:2
[ 955.365352] mlx4_en: 0000:00:04.0: Port 2: frag:0 - size:1522 prefix:0 stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.363940] mlx4_core 0000:00:04.0 enp0s4: renamed from eth0
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.365347] mlx4_en: 0000:00:04.0: Port 2: Using 64 TX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.365350] mlx4_en: 0000:00:04.0: Port 2: Using 8 RX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.365352] mlx4_en: 0000:00:04.0: Port 2: frag:0 - size:1522 prefix:0 stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.380726] mlx4_en: 0000:00:04.0: Port 2: Initializing port
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.386733] EEH: Notify device drivers the completion of reset
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.386737] EEH: Notify device driver to resume
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.408991] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.410735] mlx4_core 0000:00:04.0 enp0s4d1: renamed from eth0
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.411687] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.411690] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1
Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [ 957.608097] mlx4_en: enp0s4d1: Link Up
Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [ 957.662997] mlx4_en: enp0s4: Link Up
pKVM syslog:
Feb 19 18:16:47 ltciofvtr-s822l2-lp1 kernel: vfio-pci 0003:0b:00.0: enabling dev
ice (0140 -> 0142)
Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Starting Session 1302 of user root
.
Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Started Session 1302 of user root.
Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Failed to reset devices.list on /m
achine.slice: Invalid argument
The patches are finally upstream:
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=c12833acff62cff83a8b728253e7ebbc1264d75e
From c12833acff62cff83a8b728253e7ebbc1264d75e Mon Sep 17 00:00:00 2001
From: Daniel Jurgens <danielj@xxxxxxxxxxxx>
Date: Wed, 20 Apr 2016 16:01:15 +0300
Subject: net/mlx4_core: Implement pci_resume callback
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=4bfd2e6e53435a214888fd35e230157a38ffc6a0
From 4bfd2e6e53435a214888fd35e230157a38ffc6a0 Mon Sep 17 00:00:00 2001
From: Daniel Jurgens <danielj@xxxxxxxxxxxx>
Date: Wed, 20 Apr 2016 16:01:16 +0300
Subject: net/mlx4_core: Avoid repeated calls to pci enable/disable
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574697/+subscriptions