← Back to team overview

kernel-packages team mailing list archive

[Bug 1524259] Re: igb: Detected Tx Unit Hang with stack trace

 

I just booted the server into latest 4.4 kernel using packages you
pointed to me. But triggering of the issue may took couple of days. Last
longer uptime we had with 3.13.0-43-generic was about a week. Anyway,
I'll report here the news.

PS. I wasn't able to `apport-collect 1524259`. It opened lynx and when I
try to type my username to login on launchpad, it redirects me to page
source. Is it possible to run this command on headless machine?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1524259

Title:
  igb: Detected Tx Unit Hang with stack trace

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Hello.

  For some time now we have a problem with one of our servers, that
  happens sporadically (once in a day or two days) and causes are not
  still known. We searched on lauchpad and tried many possible
  solutions, but nothing helped. We had tried vanilla Ubuntu 14.04.3
  kernel - 3.16.x, and also 3.19.0-25-generic and linux-
  image-3.19.0-33-generic - the same symptoms on all of these versions.
  We also tried to rollback to 3.13: 3.13.0-43-generic and
  3.13.0-62-generic, but the problem still persists.

  Our current configuration is: Ubuntu 14.04.3 with kernel 3.13.0-43.72
  with Xen 4.4.2-0ubuntu0.14.04.3 (this host is used as xen hypervisor
  with iSCSI initiator if it is important). And here is how it's going:

  kernel: [135522.062941] igb 0000:01:00.1: Detected Tx Unit Hang
  kernel: [135522.062941]   Tx Queue             <5>
  kernel: [135522.062941]   TDH                  <e>
  kernel: [135522.062941]   TDT                  <21>
  kernel: [135522.062941]   next_to_use          <21>
  kernel: [135522.062941]   next_to_clean        <e>
  kernel: [135522.062941] buffer_info[next_to_clean]
  kernel: [135522.062941]   time_stamp           <10203c3ca>
  kernel: [135522.062941]   next_to_watch        <ffff8800bac590f0>
  kernel: [135522.062941]   jiffies              <10203c4e6>
  kernel: [135522.062941]   desc.status          <1c8200>
  kernel: [135526.063054]   desc.status          <0>

  Many of messages like this. Right after that we have reports like:
  kernel: [135526.982825]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4328767466, last ping 4328768718, now 4328769972
  kernel: [135526.982911]  connection2:0: detected conn error (1011)

  And finally:

  kernel: [135527.014836] WARNING: CPU: 8 PID: 0 at /build/buildd/linux-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()
  kernel: [135527.014839] NETDEV WATCHDOG: eth1 (igb): transmit queue 4 timed out
  kernel: [135527.014841] Modules linked in: xt_physdev xen_netback xen_blkback cls_u32 sch_sfq sch_htb xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xen_gntdev xen_evtchn xenfs xen_privcmd ip6_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich joydev ioatdma serio_raw mac_hid shpchp lpc_ich i7core_edac intel_powerclamp coretemp edac_core lp parport hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear iptable_raw nf_nat nf_conntrack iptable_mangle iptable_filter psmouse ip_tables igb x_tables ahci libahci i2c_algo_bit dca ptp bridge pps_core 8021q garp stp llc mrp
  kernel: [135527.014903] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 3.13.0-43-generic #72-Ubuntu
  kernel: [135527.014905] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c       08/03/2012
  kernel: [135527.014907]  0000000000000009 ffff880268103d98 ffffffff81720bf6 ffff880268103de0
  kernel: [135527.014912]  ffff880268103dd0 ffffffff810677cd 0000000000000004 ffff880250b18000
  kernel: [135527.014916]  ffff8800030e5940 0000000000000008 0000000000000008 ffff880268103e30
  kernel: [135527.014920] Call Trace:
  kernel: [135527.014923]  <IRQ>  [<ffffffff81720bf6>] dump_stack+0x45/0x56
  kernel: [135527.014934]  [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
  kernel: [135527.014937]  [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
  kernel: [135527.014943]  [<ffffffff81645686>] dev_watchdog+0x276/0x280
  kernel: [135527.014947]  [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
  kernel: [135527.014952]  [<ffffffff81074386>] call_timer_fn+0x36/0x100
  kernel: [135527.014955]  [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
  kernel: [135527.014959]  [<ffffffff8107531f>] run_timer_softirq+0x1ef/0x2f0
  kernel: [135527.014964]  [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
  kernel: [135527.014969]  [<ffffffff8106d165>] irq_exit+0x105/0x110
  kernel: [135527.014976]  [<ffffffff814340f5>] xen_evtchn_do_upcall+0x35/0x50
  kernel: [135527.014981]  [<ffffffff8173313e>] xen_do_hypervisor_callback+0x1e/0x30
  kernel: [135527.014982]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  kernel: [135527.014990]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  kernel: [135527.014996]  [<ffffffff81009e20>] ? xen_safe_halt+0x10/0x20
  kernel: [135527.015001]  [<ffffffff8101caaf>] ? default_idle+0x1f/0xc0
  kernel: [135527.015005]  [<ffffffff8101d376>] ? arch_cpu_idle+0x26/0x30
  kernel: [135527.015010]  [<ffffffff810bef35>] ? cpu_startup_entry+0xc5/0x290
  kernel: [135527.015015]  [<ffffffff810101b8>] ? cpu_bringup_and_idle+0x18/0x20
  kernel: [135527.015018] ---[ end trace 431e88429488f9a4 ]---
  kernel: [135527.015044] igb 0000:01:00.1 eth1: Reset adapter

  Then the network connection to this machine is dead and it tries to
  reconnect continuously, but with no success.

  We had no problems after rollback to 3.13.0-43 kernel in about a week,
  but now it's continue crashing with the above error. I'm not sure how
  to diagnose this, so need assist. Thanks.

  Thats what we have in dmesg about the NIC's:
  [   15.220822] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
  [   15.220882] igb: Copyright (c) 2007-2013 Intel Corporation.
  [   15.421684] igb 0000:01:00.0: added PHC on eth0
  [   15.421770] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
  [   15.421827] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fc
  [   15.421885] igb 0000:01:00.0: eth0: PBA No: Unknown
  [   15.421939] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
  [   15.621679] igb 0000:01:00.1: added PHC on eth1
  [   15.621747] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
  [   15.621815] igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fd
  [   15.621885] igb 0000:01:00.1: eth1: PBA No: Unknown
  [   15.621949] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
  [   24.581560] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  [   30.941733] igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
  [   30.941851] igb 0000:01:00.1 eth1: Link Speed was downgraded by SmartSpeed

  And here is ethtool output:
  Features for eth1:
  rx-checksumming: on
  tx-checksumming: on
  	tx-checksum-ipv4: on
  	tx-checksum-ip-generic: off [fixed]
  	tx-checksum-ipv6: on
  	tx-checksum-fcoe-crc: off [fixed]
  	tx-checksum-sctp: on
  scatter-gather: on
  	tx-scatter-gather: on
  	tx-scatter-gather-fraglist: off [fixed]
  tcp-segmentation-offload: on
  	tx-tcp-segmentation: on
  	tx-tcp-ecn-segmentation: off [fixed]
  	tx-tcp6-segmentation: on
  udp-fragmentation-offload: off [fixed]
  generic-segmentation-offload: on
  generic-receive-offload: on
  large-receive-offload: off [fixed]
  rx-vlan-offload: on
  tx-vlan-offload: on
  ntuple-filters: off [fixed]
  receive-hashing: on
  highdma: on [fixed]
  rx-vlan-filter: on [fixed]
  vlan-challenged: off [fixed]
  tx-lockless: off [fixed]
  netns-local: off [fixed]
  tx-gso-robust: off [fixed]
  tx-fcoe-segmentation: off [fixed]
  tx-gre-segmentation: off [fixed]
  tx-ipip-segmentation: off [fixed]
  tx-sit-segmentation: off [fixed]
  tx-udp_tnl-segmentation: off [fixed]
  tx-mpls-segmentation: off [fixed]
  fcoe-mtu: off [fixed]
  tx-nocache-copy: on
  loopback: off [fixed]
  rx-fcs: off [fixed]
  rx-all: off
  tx-vlan-stag-hw-insert: off [fixed]
  rx-vlan-stag-hw-parse: off [fixed]
  rx-vlan-stag-filter: off [fixed]
  l2-fwd-offload: off [fixed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1524259/+subscriptions