kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #153729
[Bug 1524259] Re: igb: Detected Tx Unit Hang with stack trace
Hello. So our 4.4 servers work like a charm with 4.4 kernel for 18 days.
Looks like the bug is fixed in mainline kernel.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1524259
Title:
igb: Detected Tx Unit Hang with stack trace
Status in linux package in Ubuntu:
Incomplete
Bug description:
Hello.
For some time now we have a problem with one of our servers, that
happens sporadically (once in a day or two days) and causes are not
still known. We searched on lauchpad and tried many possible
solutions, but nothing helped. We had tried vanilla Ubuntu 14.04.3
kernel - 3.16.x, and also 3.19.0-25-generic and linux-
image-3.19.0-33-generic - the same symptoms on all of these versions.
We also tried to rollback to 3.13: 3.13.0-43-generic and
3.13.0-62-generic, but the problem still persists.
Our current configuration is: Ubuntu 14.04.3 with kernel 3.13.0-43.72
with Xen 4.4.2-0ubuntu0.14.04.3 (this host is used as xen hypervisor
with iSCSI initiator if it is important). And here is how it's going:
kernel: [135522.062941] igb 0000:01:00.1: Detected Tx Unit Hang
kernel: [135522.062941] Tx Queue <5>
kernel: [135522.062941] TDH <e>
kernel: [135522.062941] TDT <21>
kernel: [135522.062941] next_to_use <21>
kernel: [135522.062941] next_to_clean <e>
kernel: [135522.062941] buffer_info[next_to_clean]
kernel: [135522.062941] time_stamp <10203c3ca>
kernel: [135522.062941] next_to_watch <ffff8800bac590f0>
kernel: [135522.062941] jiffies <10203c4e6>
kernel: [135522.062941] desc.status <1c8200>
kernel: [135526.063054] desc.status <0>
Many of messages like this. Right after that we have reports like:
kernel: [135526.982825] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4328767466, last ping 4328768718, now 4328769972
kernel: [135526.982911] connection2:0: detected conn error (1011)
And finally:
kernel: [135527.014836] WARNING: CPU: 8 PID: 0 at /build/buildd/linux-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()
kernel: [135527.014839] NETDEV WATCHDOG: eth1 (igb): transmit queue 4 timed out
kernel: [135527.014841] Modules linked in: xt_physdev xen_netback xen_blkback cls_u32 sch_sfq sch_htb xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xen_gntdev xen_evtchn xenfs xen_privcmd ip6_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich joydev ioatdma serio_raw mac_hid shpchp lpc_ich i7core_edac intel_powerclamp coretemp edac_core lp parport hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear iptable_raw nf_nat nf_conntrack iptable_mangle iptable_filter psmouse ip_tables igb x_tables ahci libahci i2c_algo_bit dca ptp bridge pps_core 8021q garp stp llc mrp
kernel: [135527.014903] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 3.13.0-43-generic #72-Ubuntu
kernel: [135527.014905] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c 08/03/2012
kernel: [135527.014907] 0000000000000009 ffff880268103d98 ffffffff81720bf6 ffff880268103de0
kernel: [135527.014912] ffff880268103dd0 ffffffff810677cd 0000000000000004 ffff880250b18000
kernel: [135527.014916] ffff8800030e5940 0000000000000008 0000000000000008 ffff880268103e30
kernel: [135527.014920] Call Trace:
kernel: [135527.014923] <IRQ> [<ffffffff81720bf6>] dump_stack+0x45/0x56
kernel: [135527.014934] [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
kernel: [135527.014937] [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
kernel: [135527.014943] [<ffffffff81645686>] dev_watchdog+0x276/0x280
kernel: [135527.014947] [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
kernel: [135527.014952] [<ffffffff81074386>] call_timer_fn+0x36/0x100
kernel: [135527.014955] [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
kernel: [135527.014959] [<ffffffff8107531f>] run_timer_softirq+0x1ef/0x2f0
kernel: [135527.014964] [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
kernel: [135527.014969] [<ffffffff8106d165>] irq_exit+0x105/0x110
kernel: [135527.014976] [<ffffffff814340f5>] xen_evtchn_do_upcall+0x35/0x50
kernel: [135527.014981] [<ffffffff8173313e>] xen_do_hypervisor_callback+0x1e/0x30
kernel: [135527.014982] <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
kernel: [135527.014990] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
kernel: [135527.014996] [<ffffffff81009e20>] ? xen_safe_halt+0x10/0x20
kernel: [135527.015001] [<ffffffff8101caaf>] ? default_idle+0x1f/0xc0
kernel: [135527.015005] [<ffffffff8101d376>] ? arch_cpu_idle+0x26/0x30
kernel: [135527.015010] [<ffffffff810bef35>] ? cpu_startup_entry+0xc5/0x290
kernel: [135527.015015] [<ffffffff810101b8>] ? cpu_bringup_and_idle+0x18/0x20
kernel: [135527.015018] ---[ end trace 431e88429488f9a4 ]---
kernel: [135527.015044] igb 0000:01:00.1 eth1: Reset adapter
Then the network connection to this machine is dead and it tries to
reconnect continuously, but with no success.
We had no problems after rollback to 3.13.0-43 kernel in about a week,
but now it's continue crashing with the above error. I'm not sure how
to diagnose this, so need assist. Thanks.
Thats what we have in dmesg about the NIC's:
[ 15.220822] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
[ 15.220882] igb: Copyright (c) 2007-2013 Intel Corporation.
[ 15.421684] igb 0000:01:00.0: added PHC on eth0
[ 15.421770] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 15.421827] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fc
[ 15.421885] igb 0000:01:00.0: eth0: PBA No: Unknown
[ 15.421939] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[ 15.621679] igb 0000:01:00.1: added PHC on eth1
[ 15.621747] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
[ 15.621815] igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fd
[ 15.621885] igb 0000:01:00.1: eth1: PBA No: Unknown
[ 15.621949] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[ 24.581560] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 30.941733] igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[ 30.941851] igb 0000:01:00.1 eth1: Link Speed was downgraded by SmartSpeed
And here is ethtool output:
Features for eth1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1524259/+subscriptions