← Back to team overview

kernel-packages team mailing list archive

[Bug 1524259] Missing required logs.

 

This bug is missing log files that will aid in diagnosing the problem.
>From a terminal window please run:

apport-collect 1524259

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

** Tags added: trusty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1524259

Title:
  igb: Detected Tx Unit Hang with stack trace

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Hello.

  For some time now we have a problem with one of our servers, that
  happens sporadically (once in a day or two days) and causes are not
  still known. We searched on lauchpad and tried many possible
  solutions, but nothing helped. We had tried vanilla Ubuntu 14.04.3
  kernel - 3.16.x, and also 3.19.0-25-generic and linux-
  image-3.19.0-33-generic - the same symptoms on all of these versions.
  We also tried to rollback to 3.13: 3.13.0-43-generic and
  3.13.0-62-generic, but the problem still persists.

  Our current configuration is: Ubuntu 14.04.3 with kernel 3.13.0-43.72
  with Xen 4.4.2-0ubuntu0.14.04.3 (this host is used as xen hypervisor
  with iSCSI initiator if it is important). And here is how it's going:

  kernel: [135522.062941] igb 0000:01:00.1: Detected Tx Unit Hang
  kernel: [135522.062941]   Tx Queue             <5>
  kernel: [135522.062941]   TDH                  <e>
  kernel: [135522.062941]   TDT                  <21>
  kernel: [135522.062941]   next_to_use          <21>
  kernel: [135522.062941]   next_to_clean        <e>
  kernel: [135522.062941] buffer_info[next_to_clean]
  kernel: [135522.062941]   time_stamp           <10203c3ca>
  kernel: [135522.062941]   next_to_watch        <ffff8800bac590f0>
  kernel: [135522.062941]   jiffies              <10203c4e6>
  kernel: [135522.062941]   desc.status          <1c8200>
  kernel: [135526.063054]   desc.status          <0>

  Many of messages like this. Right after that we have reports like:
  kernel: [135526.982825]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4328767466, last ping 4328768718, now 4328769972
  kernel: [135526.982911]  connection2:0: detected conn error (1011)

  And finally:

  kernel: [135527.014836] WARNING: CPU: 8 PID: 0 at /build/buildd/linux-3.13.0/net/sched/sch_generic.c:264 dev_watchdog+0x276/0x280()
  kernel: [135527.014839] NETDEV WATCHDOG: eth1 (igb): transmit queue 4 timed out
  kernel: [135527.014841] Modules linked in: xt_physdev xen_netback xen_blkback cls_u32 sch_sfq sch_htb xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xen_gntdev xen_evtchn xenfs xen_privcmd ip6_tables ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich joydev ioatdma serio_raw mac_hid shpchp lpc_ich i7core_edac intel_powerclamp coretemp edac_core lp parport hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear iptable_raw nf_nat nf_conntrack iptable_mangle iptable_filter psmouse ip_tables igb x_tables ahci libahci i2c_algo_bit dca ptp bridge pps_core 8021q garp stp llc mrp
  kernel: [135527.014903] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 3.13.0-43-generic #72-Ubuntu
  kernel: [135527.014905] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c       08/03/2012
  kernel: [135527.014907]  0000000000000009 ffff880268103d98 ffffffff81720bf6 ffff880268103de0
  kernel: [135527.014912]  ffff880268103dd0 ffffffff810677cd 0000000000000004 ffff880250b18000
  kernel: [135527.014916]  ffff8800030e5940 0000000000000008 0000000000000008 ffff880268103e30
  kernel: [135527.014920] Call Trace:
  kernel: [135527.014923]  <IRQ>  [<ffffffff81720bf6>] dump_stack+0x45/0x56
  kernel: [135527.014934]  [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
  kernel: [135527.014937]  [<ffffffff8106783c>] warn_slowpath_fmt+0x4c/0x50
  kernel: [135527.014943]  [<ffffffff81645686>] dev_watchdog+0x276/0x280
  kernel: [135527.014947]  [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
  kernel: [135527.014952]  [<ffffffff81074386>] call_timer_fn+0x36/0x100
  kernel: [135527.014955]  [<ffffffff81645410>] ? dev_graft_qdisc+0x80/0x80
  kernel: [135527.014959]  [<ffffffff8107531f>] run_timer_softirq+0x1ef/0x2f0
  kernel: [135527.014964]  [<ffffffff8106cc1c>] __do_softirq+0xec/0x2c0
  kernel: [135527.014969]  [<ffffffff8106d165>] irq_exit+0x105/0x110
  kernel: [135527.014976]  [<ffffffff814340f5>] xen_evtchn_do_upcall+0x35/0x50
  kernel: [135527.014981]  [<ffffffff8173313e>] xen_do_hypervisor_callback+0x1e/0x30
  kernel: [135527.014982]  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  kernel: [135527.014990]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  kernel: [135527.014996]  [<ffffffff81009e20>] ? xen_safe_halt+0x10/0x20
  kernel: [135527.015001]  [<ffffffff8101caaf>] ? default_idle+0x1f/0xc0
  kernel: [135527.015005]  [<ffffffff8101d376>] ? arch_cpu_idle+0x26/0x30
  kernel: [135527.015010]  [<ffffffff810bef35>] ? cpu_startup_entry+0xc5/0x290
  kernel: [135527.015015]  [<ffffffff810101b8>] ? cpu_bringup_and_idle+0x18/0x20
  kernel: [135527.015018] ---[ end trace 431e88429488f9a4 ]---
  kernel: [135527.015044] igb 0000:01:00.1 eth1: Reset adapter

  Then the network connection to this machine is dead and it tries to
  reconnect continuously, but with no success.

  We had no problems after rollback to 3.13.0-43 kernel in about a week,
  but now it's continue crashing with the above error. I'm not sure how
  to diagnose this, so need assist. Thanks.

  Thats what we have in dmesg about the NIC's:
  [   15.220822] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
  [   15.220882] igb: Copyright (c) 2007-2013 Intel Corporation.
  [   15.421684] igb 0000:01:00.0: added PHC on eth0
  [   15.421770] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
  [   15.421827] igb 0000:01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fc
  [   15.421885] igb 0000:01:00.0: eth0: PBA No: Unknown
  [   15.421939] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
  [   15.621679] igb 0000:01:00.1: added PHC on eth1
  [   15.621747] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
  [   15.621815] igb 0000:01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:25:90:00:cc:fd
  [   15.621885] igb 0000:01:00.1: eth1: PBA No: Unknown
  [   15.621949] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
  [   24.581560] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
  [   30.941733] igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
  [   30.941851] igb 0000:01:00.1 eth1: Link Speed was downgraded by SmartSpeed

  And here is ethtool output:
  Features for eth1:
  rx-checksumming: on
  tx-checksumming: on
  	tx-checksum-ipv4: on
  	tx-checksum-ip-generic: off [fixed]
  	tx-checksum-ipv6: on
  	tx-checksum-fcoe-crc: off [fixed]
  	tx-checksum-sctp: on
  scatter-gather: on
  	tx-scatter-gather: on
  	tx-scatter-gather-fraglist: off [fixed]
  tcp-segmentation-offload: on
  	tx-tcp-segmentation: on
  	tx-tcp-ecn-segmentation: off [fixed]
  	tx-tcp6-segmentation: on
  udp-fragmentation-offload: off [fixed]
  generic-segmentation-offload: on
  generic-receive-offload: on
  large-receive-offload: off [fixed]
  rx-vlan-offload: on
  tx-vlan-offload: on
  ntuple-filters: off [fixed]
  receive-hashing: on
  highdma: on [fixed]
  rx-vlan-filter: on [fixed]
  vlan-challenged: off [fixed]
  tx-lockless: off [fixed]
  netns-local: off [fixed]
  tx-gso-robust: off [fixed]
  tx-fcoe-segmentation: off [fixed]
  tx-gre-segmentation: off [fixed]
  tx-ipip-segmentation: off [fixed]
  tx-sit-segmentation: off [fixed]
  tx-udp_tnl-segmentation: off [fixed]
  tx-mpls-segmentation: off [fixed]
  fcoe-mtu: off [fixed]
  tx-nocache-copy: on
  loopback: off [fixed]
  rx-fcs: off [fixed]
  rx-all: off
  tx-vlan-stag-hw-insert: off [fixed]
  rx-vlan-stag-hw-parse: off [fixed]
  rx-vlan-stag-filter: off [fixed]
  l2-fwd-offload: off [fixed]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1524259/+subscriptions