kernel-packages team mailing list archive

Thread
Date
[Bug 1524259] Re: igb: Detected Tx Unit Hang with stack trace

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: "Christopher M. Penalver" <christopher.m.penalver@xxxxxxxxx>
Date: Wed, 16 Dec 2015 00:34:56 -0000
Reply-to: Bug 1524259 <1524259@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
mrk, one may generate an apport-collect file and then manually attach it
following https://wiki.ubuntu.com/ReportingBugs .

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1524259

Title:
  igb: Detected Tx Unit Hang with stack trace

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Hello.

  For some time now we have a problem with one of our servers, that
  happens sporadically (once in a day or two days) and causes are not
  still known. We searched on lauchpad and tried many possible
  solutions, but nothing helped. We had tried vanilla Ubuntu 14.04.3
  kernel - 3.16.x, and also 3.19.0-25-generic and linux-
  image-3.19.0-33-generic - the same symptoms on all of these versions.
  We also tried to rollback to 3.13: 3.13.0-43-generic and
  3.13.0-62-generic, but the problem still persists.

  Our current configuration is: Ubuntu 14.04.3 with kernel 3.13.0-43.72
  with Xen 4.4.2-0ubuntu0.14.04.3 (this host is used as xen hypervisor
  with iSCSI initiator if it is important). And here is how it's going:

  kernel: [135522.062941] igb 0000:01:00.1: Detected Tx Unit Hang
  kernel: [135522.062941]   Tx Queue             <5>
  kernel: [135522.062941]   TDH                  <e>
  kernel: [135522.062941]   TDT                  <21>
  kernel: [135522.062941]   next_to_use          <21>
  kernel: [135522.062941]   next_to_clean        <e>
  kernel: [135522.062941] buffer_info[next_to_clean]
  kernel: [135522.062941]   time_stamp           <10203c3ca>
  kernel: [135522.062941]   next_to_watch        <ffff8800bac590f0>
  kernel: [135522.062941]   jiffies              <10203c4e6>
  kernel: [135522.062941]   desc.status          <1c8200>
  kernel: [135526.063054]   desc.status          <0>

  Many of messages like this. Right after that we have reports like:
  kernel: [135526.982825]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4328767466, last ping 4328768718, now 4328769972
  kernel: [135526.982911]  connection2:0: detected conn error (1011)

  And finally:

  kernel: [135527.014836] WARNING: CPU: 8 PID: 0 at /build/buildd/linux-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()
  kernel: [135527.014839] NETDEV WATCHDOG: eth1 (igb): transmit queue 4 timed out
  kernel: [135527.014841] Modules linked in: xt_physdev xen_netback xen_blkback cls_u32 sch_sfq sch_htb xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xen_gntdev xen_evtchn xenfs xen_privcmd ip6_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich joydev ioatdma serio_raw mac_hid shpchp lpc_ich i7core_edac intel_powerclamp coretemp edac_core lp parport hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear iptable_raw nf_nat nf_conntrack iptable_mangle iptable_filter psmouse ip_tables igb x_tables ahci libahci i2c_algo_bit dca ptp bridge pps_core 8021q garp stp llc mrp
  kernel: [135527.014903] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 3.13.0-43-generic #72-Ubuntu
  kernel: [135527.014905] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c       08/03/2012
  kernel: [135527.014907]  0000000000000009 ffff880268103d98 ffffffff81720bf6 ffff880268103de0
  kernel: [135527.014912]  ffff880268103dd0 ffffffff810677cd 0000000000000004 ffff880250b18000
  kernel: [135527.014916]  ffff8800030e5940 0000000000000008 0000000000000008 ffff880268103e30
  kernel: [135527.014920] Call Trace:
  kernel: [135527.014923]  <IRQ>  [<ffffffff81720bf6>] dump_stack+0x45/0x56
  kernel: [135527.014934]  [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
  kernel: [135527.014937]  [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
  kernel: [135527.014943]  [<ffffffff81645686>] dev_watchdog+0x276/0x280
  kernel: [135527.014947]  [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
  kernel: [135527.014952]  [<ffffffff81074386>] call_timer_fn+0x36/0x100
  kernel: [135527.014955]  [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
  kernel: [135527.014959]  [<ffffffff8107531f>] run_timer_softirq+0x1ef/0x2f0
  kernel: [135527.014964]  [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
  kernel: [135527.014969]  [<ffffffff8106d165>] irq_exit+0x105/0x110
  kernel: [135527.014976]  [<ffffffff814340f5>] xen_evtchn_do_upcall+0x35/0x50
  kernel: [135527.014981]  [<ffffffff8173313e>] xen_do_hypervisor_callback+0x1e/0x30
  kernel: [135527.014982]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  kernel: [135527.014990]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  kernel: [135527.014996]  [<ffffffff81009e20>] ? xen_safe_halt+0x10/0x20
  kernel: [135527.015001]  [<ffffffff8101caaf>] ? default_idle+0x1f/0xc0
  kernel: [135527.015005]  [<ffffffff8101d376>] ? arch_cpu_idle+0x26/0x30
  kernel: [135527.015010]  [<ffffffff810bef35>] ? cpu_startup_entry+0xc5/0x290
  kernel: [135527.015015]  [<ffffffff810101b8>] ? cpu_bringup_and_idle+0x18/0x20
  kernel: [135527.015018] ---[ end trace 431e88429488f9a4 ]---
  kernel: [135527.015044] igb 0000:01:00.1 eth1: Reset adapter

  Then the network connection to this machine is dead and it tries to
  reconnect continuously, but with no success.

  We had no problems after rollback to 3.13.0-43 kernel in about a week,
  but now it's continue crashing with the above error. I'm not sure how
  to diagnose this, so need assist. Thanks.

  Thats what we have in dmesg about the NIC's:
  [   15.220822] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
  [   15.220882] igb: Copyright (c) 2007-2013 Intel Corporation.
  [   15.421684] igb 0000:01:00.0: added PHC on eth0
  [   15.421770] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
  [   15.421827] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fc
  [   15.421885] igb 0000:01:00.0: eth0: PBA No: Unknown
  [   15.421939] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
  [   15.621679] igb 0000:01:00.1: added PHC on eth1
  [   15.621747] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
  [   15.621815] igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fd
  [   15.621885] igb 0000:01:00.1: eth1: PBA No: Unknown
  [   15.621949] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
  [   24.581560] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  [   30.941733] igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
  [   30.941851] igb 0000:01:00.1 eth1: Link Speed was downgraded by SmartSpeed

  And here is ethtool output:
  Features for eth1:
  rx-checksumming: on
  tx-checksumming: on
  	tx-checksum-ipv4: on
  	tx-checksum-ip-generic: off [fixed]
  	tx-checksum-ipv6: on
  	tx-checksum-fcoe-crc: off [fixed]
  	tx-checksum-sctp: on
  scatter-gather: on
  	tx-scatter-gather: on
  	tx-scatter-gather-fraglist: off [fixed]
  tcp-segmentation-offload: on
  	tx-tcp-segmentation: on
  	tx-tcp-ecn-segmentation: off [fixed]
  	tx-tcp6-segmentation: on
  udp-fragmentation-offload: off [fixed]
  generic-segmentation-offload: on
  generic-receive-offload: on
  large-receive-offload: off [fixed]
  rx-vlan-offload: on
  tx-vlan-offload: on
  ntuple-filters: off [fixed]
  receive-hashing: on
  highdma: on [fixed]
  rx-vlan-filter: on [fixed]
  vlan-challenged: off [fixed]
  tx-lockless: off [fixed]
  netns-local: off [fixed]
  tx-gso-robust: off [fixed]
  tx-fcoe-segmentation: off [fixed]
  tx-gre-segmentation: off [fixed]
  tx-ipip-segmentation: off [fixed]
  tx-sit-segmentation: off [fixed]
  tx-udp_tnl-segmentation: off [fixed]
  tx-mpls-segmentation: off [fixed]
  fcoe-mtu: off [fixed]
  tx-nocache-copy: on
  loopback: off [fixed]
  rx-fcs: off [fixed]
  rx-all: off
  tx-vlan-stag-hw-insert: off [fixed]
  rx-vlan-stag-hw-parse: off [fixed]
  rx-vlan-stag-filter: off [fixed]
  l2-fwd-offload: off [fixed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1524259/+subscriptions