kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #126786
[Bug 1404409] Re: [regression] Intel 10Gb NIC Crashes
I do see some a kernel log entry that says "audit_printk_skb: 42
callbacks suppressed", but definitely no stack dumps.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1404409
Title:
[regression] Intel 10Gb NIC Crashes
Status in linux package in Ubuntu:
In Progress
Bug description:
I posted this to netdev@xxxxxxxxxxxxxxx as well:
http://www.spinics.net/lists/netdev/msg309110.html
I think the next step is to try to bisect this down to a specific commit. I'm starting to look at the instructions here:
https://wiki.ubuntu.com/Kernel/KernelBisection
-----
Previous history of this thread:
http://thread.gmane.org/gmane.linux.network/326672
On 2014-11-04 22:57:19, Tom Herbert wrote:
> Using vlan and bonding? vlan_dev_hard_start_xmit called. A possible
> cause is that bonding interface is out of sync with slave interface
> w.r.t. GSO features. Do we know if this worked in 3.14, 3.15?
I'm seeing the same sort of crash/warning (skb_war_bad_offload). It's
happening on Intel 10 Gig NICs using the ixgbe driver. I'm using bridges
(for virtual machines) on top of VLANs on top of 802.3ad bonding. I'm
using an MTU of 9000 on the bond0 interface, but 1500 everywhere else.
I'm always bonding two ports: one one system, I'm bonding two ports on
identical one-port NICs; on another system, I'm bonding two ports on a
single two-port NIC. Both systems exhibit the same behavior.
Everything has worked fine for a couple years on Ubuntu 12.04 Precise
(Linux 3.2.0). It immediately broke when I upgraded to Ubuntu 14.04
Trusty (Linux 3.13.0). I can also reproduce this using the packaged
version of Linux 3.16.0 on Trusty.
In contrast to other reports of this bug, disabling scatter gather on
the physical interfaces (e.g. eth0) does *not* stop the crashes
(assuming I disabled it correctly).
I currently have two systems (one with Precise, one with Trusty)
available to do any testing that you'd find helpful.
Here's a first pass at getting some debugging data.
The broken system (Ubuntu 14.04 Trusty):
rlaager@BROKEN:~$ uname -a
Linux BROKEN 3.13.0-43-generic #72-Ubuntu SMP Mon Dec 8 19:35:06 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux
rlaager@BROKEN:~$ ethtool -k p6p1
Features for p6p1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: on [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
rlaager@BROKEN:~$ ethtool -k bond0
Features for bond0:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off [requested on]
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
rlaager@BROKEN:~$ ethtool -k br7
Features for br7:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [requested on]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [requested on]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [requested on]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off [requested on]
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
The working system (Ubuntu 12.04 Precise):
rlaager@WORKING:~$ uname -a
Linux WORKING 3.2.0-74-generic #109-Ubuntu SMP Tue Dec 9 16:45:49 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux
rlaager@WORKING:~$ ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
rlaager@WORKING:~$ ethtool -k bond0
Offload parameters for bond0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off
rlaager@WORKING:~$ ethtool -k br7
Offload parameters for br7:
rx-checksumming: on
tx-checksumming: on
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: on
ntuple-filters: off
A stack trace from 3.13.0 (the default kernel in Ubuntu Trusty):
[ 1161.275007] WARNING: CPU: 7 PID: 0 at /build/buildd/linux-3.13.0/net/core/dev.c:2224 skb_warn_bad_offload+0xcd/0xda()
[ 1161.275011] : caps=(0x00000022000048c1, 0x0000000000000000) len=1514 data_len=1460 gso_size=1460 gso_type=1 ip_summed=1
[ 1161.275012] Modules linked in: nfsv3 ipmi_devintf ipmi_si vhost_net vhost macvtap macvlan bridge ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_mul
mrp xt_addrtype llc bonding nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_
ch intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw joydev i7core_eda
id nfs_acl lp parport nfs lockd sunrpc fscache ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor ixgbe raid6_pq dca hid_generic raid1 ptp mpt2sas
smouse hid libahci scsi_transport_sas mdio linear
[ 1161.275077] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W 3.13.0-43-generic #72-Ubuntu
[ 1161.275079] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0a 09/14/2010
[ 1161.275080] 0000000000000009 ffff880c3fc239d8 ffffffff81720bf6 ffff880c3fc23a20
[ 1161.275085] ffff880c3fc23a10 ffffffff810677cd ffff880c1d3b9600 ffff880618e08000
[ 1161.275089] 0000000000000001 0000000000000001 ffff880c1d3b9600 ffff880c3fc23a70
[ 1161.275092] Call Trace:
[ 1161.275094] <IRQ> [<ffffffff81720bf6>] dump_stack+0x45/0x56
[ 1161.275101] [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
[ 1161.275105] [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
[ 1161.275109] [<ffffffff8136a0a3>] ? ___ratelimit+0x93/0x100
[ 1161.275113] [<ffffffff81723afe>] skb_warn_bad_offload+0xcd/0xda
[ 1161.275118] [<ffffffff81626489>] __skb_gso_segment+0x79/0xb0
[ 1161.275122] [<ffffffff8162677a>] dev_hard_start_xmit+0x18a/0x560
[ 1161.275126] [<ffffffff81098209>] ? ttwu_do_wakeup+0x19/0xc0
[ 1161.275129] [<ffffffff8164594e>] sch_direct_xmit+0xee/0x1c0
[ 1161.275133] [<ffffffff81626d80>] __dev_queue_xmit+0x230/0x500
[ 1161.275137] [<ffffffff81627060>] dev_queue_xmit+0x10/0x20
[ 1161.275143] [<ffffffffa04ab31b>] br_dev_queue_push_xmit+0x7b/0xc0 [bridge]
[ 1161.275149] [<ffffffffa04ab532>] br_forward_finish+0x22/0x60 [bridge]
[ 1161.275155] [<ffffffffa04ab710>] __br_forward+0x80/0xf0 [bridge]
[ 1161.275161] [<ffffffffa04ab9bb>] br_forward+0x8b/0xa0 [bridge]
[ 1161.275167] [<ffffffffa04ac6d9>] br_handle_frame_finish+0x149/0x3d0 [bridge]
[ 1161.275173] [<ffffffffa04acad5>] br_handle_frame+0x175/0x250 [bridge]
[ 1161.275177] [<ffffffff81624ac2>] __netif_receive_skb_core+0x262/0x840
[ 1161.275181] [<ffffffff8101b700>] ? check_tsc_unstable+0x10/0x10
[ 1161.275184] [<ffffffff816250b8>] __netif_receive_skb+0x18/0x60
[ 1161.275188] [<ffffffff81625123>] netif_receive_skb+0x23/0x90
[ 1161.275192] [<ffffffff81625b70>] napi_gro_receive+0x80/0xb0
[ 1161.275202] [<ffffffffa014009c>] ixgbe_clean_rx_irq+0x7ac/0xb10 [ixgbe]
[ 1161.275211] [<ffffffffa0141140>] ixgbe_poll+0x460/0x800 [ixgbe]
[ 1161.275216] [<ffffffff816254a2>] net_rx_action+0x152/0x250
[ 1161.275220] [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
[ 1161.275223] [<ffffffff8106d165>] irq_exit+0x105/0x110
[ 1161.275227] [<ffffffff817339e6>] do_IRQ+0x56/0xc0
[ 1161.275231] [<ffffffff817290ed>] common_interrupt+0x6d/0x6d
[ 1161.275232] <EOI> [<ffffffff815d361f>] ? cpuidle_enter_state+0x4f/0xc0
[ 1161.275240] [<ffffffff815d3749>] cpuidle_idle_call+0xb9/0x1f0
[ 1161.275244] [<ffffffff8101d35e>] arch_cpu_idle+0xe/0x30
[ 1161.275247] [<ffffffff810bef35>] cpu_startup_entry+0xc5/0x290
[ 1161.275251] [<ffffffff810413ed>] start_secondary+0x21d/0x2d0
A stack trace from 3.16.0 (still on Ubuntu Trusty):
[ 120.376026] WARNING: CPU: 6 PID: 0 at /build/buildd/linux-lts-utopic-3.16.0/net/core/dev.c:2246 skb_warn_bad_offload+0xcd/0xda()
[ 120.376029] : caps=(0x00000080000048c1, 0x0000000000000000) len=1514 data_len=1460 gso_size=1460 gso_type=1 ip_summed=1
[ 120.376030] Modules linked in: nfsv3 ipmi_devintf ipmi_si ipmi_msghandler vhost_net vhost macvtap macvlan bridge 8021q garp stp mrp llc bonding ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_multiport xt_recent xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_powerclamp coretemp kvm_intel gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw lpc_ich joydev i7core_edac ioatdma edac_core nfsd auth_rpcgss mac_hid nfs_acl lp parport nfs lockd sunrpc fscache ses enclosure raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic raid6_pq ixgbe usbhid raid1 mpt2sas dca ahci raid0 ptp raid_class pps_core scsi_transport_sas multipath hid mdio libahci linear
[ 120.376085] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 3.16.0-28-generic #37-Ubuntu
[ 120.376086] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0a 09/14/2010
[ 120.376088] 0000000000000009 ffff880c3fc039b8 ffffffff81762220 ffff880c3fc03a00
[ 120.376090] ffff880c3fc039f0 ffffffff8106dd2d ffff880c1ac99a00 ffff88061c2fc000
[ 120.376092] 0000000000000001 0000000000000001 ffff880c1ac99a00 ffff880c3fc03a50
[ 120.376094] Call Trace:
[ 120.376096] <IRQ> [<ffffffff81762220>] dump_stack+0x45/0x56
[ 120.376105] [<ffffffff8106dd2d>] warn_slowpath_common+0x7d/0xa0
[ 120.376107] [<ffffffff8106dd9c>] warn_slowpath_fmt+0x4c/0x50
[ 120.376111] [<ffffffff8138b153>] ? ___ratelimit+0x93/0x100
[ 120.376114] [<ffffffff817654da>] skb_warn_bad_offload+0xcd/0xda
[ 120.376119] [<ffffffff81661d29>] __skb_gso_segment+0x79/0xb0
[ 120.376122] [<ffffffff81662052>] dev_hard_start_xmit+0x182/0x5c0
[ 120.376125] [<ffffffff8168337e>] sch_direct_xmit+0xee/0x1c0
[ 120.376127] [<ffffffff81662690>] __dev_queue_xmit+0x200/0x4d0
[ 120.376129] [<ffffffff81662970>] dev_queue_xmit+0x10/0x20
[ 120.376135] [<ffffffffc0796ac8>] br_dev_queue_push_xmit+0x68/0xa0 [bridge]
[ 120.376138] [<ffffffffc0796cd2>] br_forward_finish+0x22/0x60 [bridge]
[ 120.376142] [<ffffffffc0796e90>] __br_forward+0x80/0xf0 [bridge]
[ 120.376145] [<ffffffffc079713b>] br_forward+0x8b/0xa0 [bridge]
[ 120.376149] [<ffffffffc0797fb9>] br_handle_frame_finish+0x139/0x3c0 [bridge]
[ 120.376153] [<ffffffffc079838e>] br_handle_frame+0x14e/0x240 [bridge]
[ 120.376155] [<ffffffff81660102>] __netif_receive_skb_core+0x1b2/0x790
[ 120.376158] [<ffffffff8101bcd9>] ? read_tsc+0x9/0x20
[ 120.376161] [<ffffffff816606f8>] __netif_receive_skb+0x18/0x60
[ 120.376163] [<ffffffff81660763>] netif_receive_skb_internal+0x23/0x90
[ 120.376165] [<ffffffff816612c0>] napi_gro_receive+0xc0/0xf0
[ 120.376174] [<ffffffffc03007ac>] ixgbe_clean_rx_irq+0x7bc/0xb40 [ixgbe]
[ 120.376180] [<ffffffffc03018a2>] ixgbe_poll+0x482/0x850 [ixgbe]
[ 120.376183] [<ffffffff8109e9e9>] ? ttwu_do_wakeup+0x19/0xc0
[ 120.376186] [<ffffffff81660b52>] net_rx_action+0x152/0x250
[ 120.376189] [<ffffffff81073055>] __do_softirq+0xf5/0x2e0
[ 120.376191] [<ffffffff81073515>] irq_exit+0x105/0x110
[ 120.376194] [<ffffffff8176d748>] do_IRQ+0x58/0xf0
[ 120.376198] [<ffffffff8176b5ed>] common_interrupt+0x6d/0x6d
[ 120.376199] <EOI> [<ffffffff815fb83f>] ? cpuidle_enter_state+0x4f/0xc0
[ 120.376204] [<ffffffff815fb838>] ? cpuidle_enter_state+0x48/0xc0
[ 120.376206] [<ffffffff815fb967>] cpuidle_enter+0x17/0x20
[ 120.376209] [<ffffffff810b527d>] cpu_startup_entry+0x31d/0x450
[ 120.376213] [<ffffffff810e028d>] ? tick_check_new_device+0xdd/0xf0
[ 120.376216] [<ffffffff8104520d>] start_secondary+0x21d/0x2e0
[ 120.376217] ---[ end trace 90d53a2c9c47f360 ]---
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-43-generic 3.13.0-43.72
ProcVersionSignature: Ubuntu 3.13.0-43.72-generic 3.13.11.11
Uname: Linux 3.13.0-43-generic x86_64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Dec 15 01:23 seq
crw-rw---- 1 root audio 116, 33 Dec 15 01:23 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Fri Dec 19 17:07:18 2014
HibernationDevice: RESUME=/dev/mapper/data-swap
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Supermicro X8DT6
PciMultimedia:
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-43-generic root=/dev/mapper/data-os ro elevator=noop console=ttyS1,115200n8 console=tty1 transparent_hugepage=always nomdmonddf nomdmonisw
RelatedPackageVersions:
linux-restricted-modules-3.13.0-43-generic N/A
linux-backports-modules-3.13.0-43-generic N/A
linux-firmware 1.127.10
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:
dmi.bios.date: 09/14/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2.0a
dmi.board.asset.tag: 1234567890
dmi.board.name: X8DT6
dmi.board.vendor: Supermicro
dmi.board.version: 1234567890
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2.0a:bd09/14/2010:svnSupermicro:pnX8DT6:pvr1234567890:rvnSupermicro:rnX8DT6:rvr1234567890:cvnSupermicro:ct17:cvr1234567890:
dmi.product.name: X8DT6
dmi.product.version: 1234567890
dmi.sys.vendor: Supermicro
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1404409/+subscriptions
References