← Back to team overview

kernel-packages team mailing list archive

[Bug 593635] Re: HDD freezes caused by ata exception that results in soft resetting of link

 

Deepak Sarda, this bug was reported a while ago and there hasn't been
any activity in it recently. We were wondering if this is still an
issue? If so, could you please test for this with the latest development
release of Ubuntu? ISO images are available from
http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in
the development release from a Terminal
(Applications->Accessories->Terminal), as it will automatically gather
and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily kernel folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12-rc2

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's
Status as Confirmed. Please let us know your results. Thank you for your
understanding.

** Changed in: linux (Ubuntu)
       Status: Triaged => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/593635

Title:
  HDD freezes caused by ata exception that results in soft resetting of
  link

Status in “linux” package in Ubuntu:
  Incomplete

Bug description:
  Under even moderately heavy disk writes, I am seeing exceptions like the below in my kern.log
  -----------------------------------------------
  Jun 13 13:33:03 cellar kernel: [66188.434868] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
  Jun 13 13:33:03 cellar kernel: [66188.434874] ata4.01: BMDMA stat 0x46
  Jun 13 13:33:03 cellar kernel: [66188.434879] ata4.01: failed command: WRITE DMA EXT
  Jun 13 13:33:03 cellar kernel: [66188.434886] ata4.01: cmd 35/00:00:00:94:b2/00:04:13:00:00/f0 tag 0 dma 524288 out
  Jun 13 13:33:03 cellar kernel: [66188.434888]          res 51/84:01:ff:95:b2/84:02:13:00:00/f0 Emask 0x30 (host bus error)
  Jun 13 13:33:03 cellar kernel: [66188.434892] ata4.01: status: { DRDY ERR }
  Jun 13 13:33:03 cellar kernel: [66188.434895] ata4.01: error: { ICRC ABRT }
  Jun 13 13:33:03 cellar kernel: [66188.434907] ata4: soft resetting link
  Jun 13 13:33:03 cellar kernel: [66188.622000] ata4.01: configured for UDMA/100
  Jun 13 13:33:03 cellar kernel: [66188.622013] ata4: EH complete
  ----------------------------------------------

  This is with the latest stable lucid kernel (2.6.32-22-generic
  #36-Ubuntu).

  I've also tried a mainline kernel (2.6.35-020635rc1) & still get the
  same errors except that there's an additional stack trace:

  -----------------------------------------------

  Jun 14 18:55:40 cellar kernel: [  152.874172] irq 19: nobody cared (try booting with the "irqpoll" option)
  Jun 14 18:55:40 cellar kernel: [  152.874182] Pid: 0, comm: swapper Tainted: P            2.6.35-020635rc1-generic #020635rc1
  Jun 14 18:55:40 cellar kernel: [  152.874185] Call Trace:
  Jun 14 18:55:40 cellar kernel: [  152.874198]  [<c01a58cc>] __report_bad_irq+0x2c/0x90
  Jun 14 18:55:40 cellar kernel: [  152.874204]  [<c016fee3>] ? sched_clock_tick+0x73/0xa0
  Jun 14 18:55:40 cellar kernel: [  152.874209]  [<c01a5a44>] note_interrupt+0xe4/0x120
  Jun 14 18:55:40 cellar kernel: [  152.874214]  [<c0179da0>] ? tick_nohz_update_jiffies+0x60/0x70
  Jun 14 18:55:40 cellar kernel: [  152.874219]  [<c01a6364>] handle_fasteoi_irq+0x84/0xe0
  Jun 14 18:55:40 cellar kernel: [  152.874224]  [<c0104abf>] handle_irq+0x1f/0x30
  Jun 14 18:55:40 cellar kernel: [  152.874230]  [<c05afefb>] do_IRQ+0x4b/0xc0
  Jun 14 18:55:40 cellar kernel: [  152.874234]  [<c01032f0>] common_interrupt+0x30/0x40
  Jun 14 18:55:40 cellar kernel: [  152.874239]  [<c010a3a7>] ? mwait_idle+0x57/0xa0
  Jun 14 18:55:40 cellar kernel: [  152.874243]  [<c010189c>] cpu_idle+0x8c/0xc0
  Jun 14 18:55:40 cellar kernel: [  152.874249]  [<c05a4337>] start_secondary+0xf7/0x130
  Jun 14 18:55:40 cellar kernel: [  152.874252] handlers:
  Jun 14 18:55:40 cellar kernel: [  152.874254] [<c0431060>] (ata_bmdma_interrupt+0x0/0x190)
  Jun 14 18:55:40 cellar kernel: [  152.874261] [<c044fb10>] (usb_hcd_irq+0x0/0x90)
  Jun 14 18:55:40 cellar kernel: [  152.874268] Disabling IRQ #19
  Jun 14 18:56:09 cellar kernel: [  181.856015] ata4: lost interrupt (Status 0x51)
  Jun 14 18:56:09 cellar kernel: [  181.856034] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  Jun 14 18:56:09 cellar kernel: [  181.856039] ata4.01: BMDMA stat 0x46, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0
  Jun 14 18:56:09 cellar kernel: [  181.856045] ata4.01: failed command: WRITE DMA EXT
  Jun 14 18:56:09 cellar kernel: [  181.856053] ata4.01: cmd 35/00:00:00:84:08/00:04:3b:00:00/f0 tag 0 dma 524288 out
  Jun 14 18:56:09 cellar kernel: [  181.856054]          res 40/00:00:00:4f:c2/00:00:00:00:00/50 Emask 0x24 (host bus error)
  Jun 14 18:56:09 cellar kernel: [  181.856058] ata4.01: status: { DRDY }
  Jun 14 18:56:09 cellar kernel: [  181.856072] ata4: soft resetting link
  Jun 14 18:56:09 cellar kernel: [  182.160065] ata4.01: configured for UDMA/133
  Jun 14 18:56:09 cellar kernel: [  182.160072] ata4.01: device reported invalid CHS sector 0
  Jun 14 18:56:09 cellar kernel: [  182.160080] ata4: EH complete
  --------------------------------------------------------------------

  I've tried booting with "libata.force=noncq" on both kernels (lucid
  stable & 2.6.35 mainline) but makes no difference.

  I didn't see these errors in Jaunty. I think they started sometime in
  Karmic. I upgraded to Lucid in the hopes that the newer release fixed
  it but no difference.

  I think I've ruled out HDD failure. I get these errors on 2 old (3+
  years) Seagate 7200.10 disks as well as a brand new Seagate 7200.12
  disk.

  There are similar bug reports in launchpad but one difference that I
  noticed is that I consistently see the message "failed command: WRITE
  DMA EXT" while the other reports fail during a read or some other
  command.

  I can very reliably reproduce the errors by running a rdiff-backup
  'restore' operation from an external USB HDD.

  == Steps to reproduce ==
  1. Boot into Gnome & login
  2. Run 'tail -f /var/log/kern.log' in one terminal window
  3. Run 'rdiff-backup --force -r now /media/freeagent/share /share/' in another terminal

  Within a few seconds, I can see the errors show up in the kernel logs.

  Running a fast torrent download will do the trick too.

  Since I can reproduce the problem so easily, I'll be very willing to
  try any special kernel builds to help solve this one.

  ProblemType: Bug
  DistroRelease: Ubuntu 10.04
  Package: linux-image-2.6.32-22-generic 2.6.32-22.36
  Regression: Yes
  Reproducible: Yes
  ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
  Uname: Linux 2.6.32-22-generic i686
  NonfreeKernelModules: nvidia
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
  Architecture: i386
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  antrix     1387 F.... pulseaudio
  CRDA: Error: [Errno 2] No such file or directory
  Card0.Amixer.info:
   Card hw:0 'Intel'/'HDA Intel at 0xf9ffc000 irq 16'
     Mixer name	: 'Realtek ALC662 rev1'
     Components	: 'HDA:10ec0662,15650000,00100101'
     Controls      : 36
     Simple ctrls  : 19
  Date: Mon Jun 14 19:23:00 2010
  HibernationDevice: RESUME=UUID=c6dab799-13a8-443e-b2a3-4b93f3bbb42e
  IwConfig:
   lo        no wireless extensions.
   
   eth0      no wireless extensions.
  MachineType: BIOSTAR Group G31-M7 TE
  ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-22-generic root=UUID=466535ad-0b59-4fd0-b18b-ba486150f91a ro quiet splash
  ProcEnviron:
   PATH=(custom, user)
   LANG=en_SG.utf8
   SHELL=/bin/bash
  RelatedPackageVersions: linux-firmware 1.34
  RfKill:
   
  SourcePackage: linux
  dmi.bios.date: 04/10/2009
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 080014
  dmi.board.asset.tag: To Be Filled By O.E.M.
  dmi.board.name: G31-M7 TE
  dmi.board.vendor: BIOSTAR Group
  dmi.chassis.asset.tag: None
  dmi.chassis.type: 3
  dmi.chassis.vendor: BIOSTAR Group
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr080014:bd04/10/2009:svnBIOSTARGroup:pnG31-M7TE:pvr:rvnBIOSTARGroup:rnG31-M7TE:rvr:cvnBIOSTARGroup:ct3:cvr:
  dmi.product.name: G31-M7 TE
  dmi.sys.vendor: BIOSTAR Group

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/593635/+subscriptions